[jira] [Updated] (HBASE-19930) fix ImmutableMemStoreLAB#forceCopyOfBigCellInto
[ https://issues.apache.org/jira/browse/HBASE-19930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gali Sheffi updated HBASE-19930: Attachment: HBASE-19930-V06.patch > fix ImmutableMemStoreLAB#forceCopyOfBigCellInto > --- > > Key: HBASE-19930 > URL: https://issues.apache.org/jira/browse/HBASE-19930 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-1 >Reporter: Gali Sheffi >Assignee: Gali Sheffi >Priority: Major > Attachments: HBASE-19930-V01.patch, HBASE-19930-V02.patch, > HBASE-19930-V03.patch, HBASE-19930-V04.patch, HBASE-19930-V05.patch, > HBASE-19930-V06.patch > > > This issue is about fixing ImmutableMemStoreLAB#forceCopyOfBigCellInto. > Following a comment in HBASE-19133 regarding a bug in > ImmutableMemStoreLAB#forceCopyOfBigCellInto (assuming this method is never > called for an ImmutableMemStoreLAB, and just throwing an > IllegalStateException whenever called), the forceCopyOfBigCellInto method now > performs the copy of big cells on the first MSLABImpl in its mslabs > linked-list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363587#comment-16363587 ] Hadoop QA commented on HBASE-19116: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 30s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 59s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} hbase-server: The patch generated 1 new + 17 unchanged - 3 fixed = 18 total (was 20) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 4s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 14m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}108m 17s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db | | JIRA Issue | HBASE-19116 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910505/HBASE-19116.branch-2.004.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 5f273f20e126 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2 / 1f3c131371 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11517/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11517/testReport/ | | Max. process+thread count | 5012 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11517/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863-branch-2.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey != null && nextIndexedKey != > KeyValueScanner.NO_NEXT_INDEXED_KEY >
[jira] [Updated] (HBASE-19998) Flakey TestVisibilityLabelsWithDefaultVisLabelService
[ https://issues.apache.org/jira/browse/HBASE-19998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19998: -- Attachment: hbase-19988.master.001.patch > Flakey TestVisibilityLabelsWithDefaultVisLabelService > - > > Key: HBASE-19998 > URL: https://issues.apache.org/jira/browse/HBASE-19998 > Project: HBase > Issue Type: Bug > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch > > > This is a good one. Its a timeout and though it has lots of test methods, the > problem is one of them gets stuck. The test method kills a RegionServers then > starts a new one. Usually all works out fine but the odd time there is this > unexplained MOVE that gets interjected just as ServerCrashProcedure starts > up. hbase:meta gets stuck (perhaps this is what is being referred to here: > https://issues.apache.org/jira/browse/HBASE-19929?focusedCommentId=16356906=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16356906). > It is trying to run the MOVE by first unassigning from the server that has > just crashed. It never succeeds. Need to fix this. Need to figure where these > Move operations are coming from too. Let me add some debug. > See here how we are well into ServerCrashProcedure... and then two MOVEs > cut-in... for hbase:meta and for namespace: > {code} > > 2018-02-14 02:35:19,806 DEBUG [PEWorker-6] > procedure.ServerCrashProcedure(192): pid=10, > state=RUNNABLE:SERVER_CRASH_PROCESS_META; ServerCrashProcedure > server=asf903.gq1.ygridcore.net,59608,1518575711969, splitWal=true, > meta=true; Processing hbase:meta that was on > asf903.gq1.ygridcore.net,59608,1518575711969 > 2018-02-14 02:35:19,807 INFO [PEWorker-6] > procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=12, > ppid=10, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure > failedMetaServer=asf903.gq1.ygridcore.net,59608,1518575711969, splitWal=true}] > 2018-02-14 02:35:19,811 DEBUG [Thread-214] procedure2.ProcedureExecutor(868): > Stored pid=11, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:meta,,1.1588230740, > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= > 2018-02-14 02:35:19,813 INFO [PEWorker-8] > procedure.MasterProcedureScheduler(813): pid=11, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:meta,,1.1588230740, > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= hbase:meta > hbase:meta,,1.1588230740 > 2018-02-14 02:35:19,814 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=14, > ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:meta, region=1588230740, > server=asf903.gq1.ygridcore.net,59608,1518575711969}] > 2018-02-14 02:35:19,831 DEBUG [Thread-214] procedure2.ProcedureExecutor(868): > Stored pid=13, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15., > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= > 2018-02-14 02:35:19,833 INFO [PEWorker-10] > procedure.MasterProcedureScheduler(813): pid=13, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15., > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= > hbase:namespace > hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15. > 2018-02-14 02:35:19,837 INFO [PEWorker-10] > procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=15, > ppid=13, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:namespace, region=e52a160b3f3a57ab50d710eba62d9b15, > server=asf903.gq1.ygridcore.net,59608,1518575711969}] > > {code} > Here is the failure of the unassign: > {code} > 2018-02-14 02:35:19,944 WARN [PEWorker-8] > assignment.RegionTransitionProcedure(187): Remote call failed pid=14, > ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:meta, region=1588230740, > server=asf903.gq1.ygridcore.net,59608,1518575711969; rit=CLOSING, > location=asf903.gq1.ygridcore.net,59608,1518575711969; exception=pid=14, > ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:meta, region=1588230740, > server=asf903.gq1.ygridcore.net,59608,1518575711969 to > asf903.gq1.ygridcore.net,59608,1518575711969 > 2018-02-14 02:35:19,945 WARN [PEWorker-8] assignment.UnassignProcedure(245): > Expiring server pid=14, ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; > UnassignProcedure table=hbase:meta, region=1588230740, >
[jira] [Commented] (HBASE-19998) Flakey TestVisibilityLabelsWithDefaultVisLabelService
[ https://issues.apache.org/jira/browse/HBASE-19998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363550#comment-16363550 ] stack commented on HBASE-19998: --- .001 is some debug I pushedto master and branch-2. > Flakey TestVisibilityLabelsWithDefaultVisLabelService > - > > Key: HBASE-19998 > URL: https://issues.apache.org/jira/browse/HBASE-19998 > Project: HBase > Issue Type: Bug > Components: flakey, test >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch > > > This is a good one. Its a timeout and though it has lots of test methods, the > problem is one of them gets stuck. The test method kills a RegionServers then > starts a new one. Usually all works out fine but the odd time there is this > unexplained MOVE that gets interjected just as ServerCrashProcedure starts > up. hbase:meta gets stuck (perhaps this is what is being referred to here: > https://issues.apache.org/jira/browse/HBASE-19929?focusedCommentId=16356906=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16356906). > It is trying to run the MOVE by first unassigning from the server that has > just crashed. It never succeeds. Need to fix this. Need to figure where these > Move operations are coming from too. Let me add some debug. > See here how we are well into ServerCrashProcedure... and then two MOVEs > cut-in... for hbase:meta and for namespace: > {code} > > 2018-02-14 02:35:19,806 DEBUG [PEWorker-6] > procedure.ServerCrashProcedure(192): pid=10, > state=RUNNABLE:SERVER_CRASH_PROCESS_META; ServerCrashProcedure > server=asf903.gq1.ygridcore.net,59608,1518575711969, splitWal=true, > meta=true; Processing hbase:meta that was on > asf903.gq1.ygridcore.net,59608,1518575711969 > 2018-02-14 02:35:19,807 INFO [PEWorker-6] > procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=12, > ppid=10, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure > failedMetaServer=asf903.gq1.ygridcore.net,59608,1518575711969, splitWal=true}] > 2018-02-14 02:35:19,811 DEBUG [Thread-214] procedure2.ProcedureExecutor(868): > Stored pid=11, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:meta,,1.1588230740, > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= > 2018-02-14 02:35:19,813 INFO [PEWorker-8] > procedure.MasterProcedureScheduler(813): pid=11, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:meta,,1.1588230740, > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= hbase:meta > hbase:meta,,1.1588230740 > 2018-02-14 02:35:19,814 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=14, > ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:meta, region=1588230740, > server=asf903.gq1.ygridcore.net,59608,1518575711969}] > 2018-02-14 02:35:19,831 DEBUG [Thread-214] procedure2.ProcedureExecutor(868): > Stored pid=13, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15., > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= > 2018-02-14 02:35:19,833 INFO [PEWorker-10] > procedure.MasterProcedureScheduler(813): pid=13, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15., > source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= > hbase:namespace > hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15. > 2018-02-14 02:35:19,837 INFO [PEWorker-10] > procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=15, > ppid=13, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:namespace, region=e52a160b3f3a57ab50d710eba62d9b15, > server=asf903.gq1.ygridcore.net,59608,1518575711969}] > > {code} > Here is the failure of the unassign: > {code} > 2018-02-14 02:35:19,944 WARN [PEWorker-8] > assignment.RegionTransitionProcedure(187): Remote call failed pid=14, > ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:meta, region=1588230740, > server=asf903.gq1.ygridcore.net,59608,1518575711969; rit=CLOSING, > location=asf903.gq1.ygridcore.net,59608,1518575711969; exception=pid=14, > ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=hbase:meta, region=1588230740, > server=asf903.gq1.ygridcore.net,59608,1518575711969 to > asf903.gq1.ygridcore.net,59608,1518575711969 > 2018-02-14 02:35:19,945 WARN [PEWorker-8] assignment.UnassignProcedure(245): > Expiring server pid=14, ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; > UnassignProcedure
[jira] [Created] (HBASE-19998) Flakey TestVisibilityLabelsWithDefaultVisLabelService
stack created HBASE-19998: - Summary: Flakey TestVisibilityLabelsWithDefaultVisLabelService Key: HBASE-19998 URL: https://issues.apache.org/jira/browse/HBASE-19998 Project: HBase Issue Type: Bug Components: flakey, test Reporter: stack Assignee: stack Fix For: 2.0.0-beta-2 This is a good one. Its a timeout and though it has lots of test methods, the problem is one of them gets stuck. The test method kills a RegionServers then starts a new one. Usually all works out fine but the odd time there is this unexplained MOVE that gets interjected just as ServerCrashProcedure starts up. hbase:meta gets stuck (perhaps this is what is being referred to here: https://issues.apache.org/jira/browse/HBASE-19929?focusedCommentId=16356906=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16356906). It is trying to run the MOVE by first unassigning from the server that has just crashed. It never succeeds. Need to fix this. Need to figure where these Move operations are coming from too. Let me add some debug. See here how we are well into ServerCrashProcedure... and then two MOVEs cut-in... for hbase:meta and for namespace: {code} 2018-02-14 02:35:19,806 DEBUG [PEWorker-6] procedure.ServerCrashProcedure(192): pid=10, state=RUNNABLE:SERVER_CRASH_PROCESS_META; ServerCrashProcedure server=asf903.gq1.ygridcore.net,59608,1518575711969, splitWal=true, meta=true; Processing hbase:meta that was on asf903.gq1.ygridcore.net,59608,1518575711969 2018-02-14 02:35:19,807 INFO [PEWorker-6] procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=12, ppid=10, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure failedMetaServer=asf903.gq1.ygridcore.net,59608,1518575711969, splitWal=true}] 2018-02-14 02:35:19,811 DEBUG [Thread-214] procedure2.ProcedureExecutor(868): Stored pid=11, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure hri=hbase:meta,,1.1588230740, source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= 2018-02-14 02:35:19,813 INFO [PEWorker-8] procedure.MasterProcedureScheduler(813): pid=11, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure hri=hbase:meta,,1.1588230740, source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= hbase:meta hbase:meta,,1.1588230740 2018-02-14 02:35:19,814 INFO [PEWorker-8] procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=14, ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=hbase:meta, region=1588230740, server=asf903.gq1.ygridcore.net,59608,1518575711969}] 2018-02-14 02:35:19,831 DEBUG [Thread-214] procedure2.ProcedureExecutor(868): Stored pid=13, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure hri=hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15., source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= 2018-02-14 02:35:19,833 INFO [PEWorker-10] procedure.MasterProcedureScheduler(813): pid=13, state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure hri=hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15., source=asf903.gq1.ygridcore.net,59608,1518575711969, destination= hbase:namespace hbase:namespace,,1518575716296.e52a160b3f3a57ab50d710eba62d9b15. 2018-02-14 02:35:19,837 INFO [PEWorker-10] procedure2.ProcedureExecutor(1498): Initialized subprocedures=[{pid=15, ppid=13, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=hbase:namespace, region=e52a160b3f3a57ab50d710eba62d9b15, server=asf903.gq1.ygridcore.net,59608,1518575711969}] {code} Here is the failure of the unassign: {code} 2018-02-14 02:35:19,944 WARN [PEWorker-8] assignment.RegionTransitionProcedure(187): Remote call failed pid=14, ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=hbase:meta, region=1588230740, server=asf903.gq1.ygridcore.net,59608,1518575711969; rit=CLOSING, location=asf903.gq1.ygridcore.net,59608,1518575711969; exception=pid=14, ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=hbase:meta, region=1588230740, server=asf903.gq1.ygridcore.net,59608,1518575711969 to asf903.gq1.ygridcore.net,59608,1518575711969 2018-02-14 02:35:19,945 WARN [PEWorker-8] assignment.UnassignProcedure(245): Expiring server pid=14, ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=hbase:meta, region=1588230740, server=asf903.gq1.ygridcore.net,59608,1518575711969; rit=CLOSING, location=asf903.gq1.ygridcore.net,59608,1518575711969, exception=org.apache.hadoop.hbase.master.assignment.FailedRemoteDispatchException: pid=14, ppid=11, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=hbase:meta, region=1588230740, server=asf903.gq1.ygridcore.net,59608,1518575711969 to asf903.gq1.ygridcore.net,59608,1518575711969 2018-02-14 02:35:19,945 WARN [PEWorker-8]
[jira] [Commented] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363542#comment-16363542 ] Pankaj Kumar commented on HBASE-19979: -- Thanks Stack..!! > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.3.2, 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363525#comment-16363525 ] Hudson commented on HBASE-19979: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #351 (See [https://builds.apache.org/job/HBase-1.3-IT/351/]) HBASE-19979 ReplicationSyncUp tool may leak Zookeeper connection (stack: rev 0507413fe61d5a17229817e2d56d7603d037bde8) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSyncUp.java > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.3.2, 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363514#comment-16363514 ] stack commented on HBASE-19979: --- Pushed to branch-1.3 too. > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.3.2, 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19979: -- Fix Version/s: 1.3.2 > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.3.2, 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363502#comment-16363502 ] stack commented on HBASE-19116: --- .004 forgot to update test. > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch, > HBASE-19116.branch-2.002.patch, HBASE-19116.branch-2.003.patch, > HBASE-19116.branch-2.004.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363497#comment-16363497 ] Pankaj Kumar commented on HBASE-19979: -- Thanks everyone for reviewing and committing this fix. Can we have this fix in branch-1.3.x as well? > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19116: -- Attachment: HBASE-19116.branch-2.004.patch > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch, > HBASE-19116.branch-2.002.patch, HBASE-19116.branch-2.003.patch, > HBASE-19116.branch-2.004.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi
[ https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363494#comment-16363494 ] stack commented on HBASE-19965: --- Pushed second addendum that breaks a TestAsyncTableAdminAPI3 out of TestAsyncTableAdminAPI. > Fix flaky TestAsyncRegionAdminApi > - > > Key: HBASE-19965 > URL: https://issues.apache.org/jira/browse/HBASE-19965 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19965.branch-2.001.patch > > > See > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/] > > java.lang.AssertionError: expected:<2> but was:<3> at > org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359) > > Merge regions not work. The table still have 3 regions after the > MergeRegionsProcedure finished. > The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because > the MergeRegionsProcedure pid=138 start work first, so the balance need wait > for the lock. But after merge regions finished, the MoveRegionProcedure > pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new > region server. This is not right. The MoveRegionProcedure should skip to > assign a region which was marked as offline. Or we should clear the merged > regions' procedure when MergeRegionsProcedure finished. > > Logs: > 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] > master.HMaster(1454): balance > hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., > source=cd4730e3eae2,39077,1518106776411, > destination=cd4730e3eae2,40578,1518106776318 > 2018-02-08 16:24:44,608 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] > procedure2.ProcedureExecutor(868): Stored pid=138, > state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure > table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, > 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false > .. > 2018-02-08 16:24:50,111 INFO [PEWorker-13] > procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; > MergeTableRegionsProcedure table=testMergeRegions, > regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], > forcibly=false in 5.5710sec > 2018-02-08 16:24:50,113 INFO [PEWorker-13] > procedure.MasterProcedureScheduler(813): pid=139, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., > source=cd4730e3eae2,39077,1518106776411, > destination=cd4730e3eae2,40578,1518106776318 testMergeRegions > testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363491#comment-16363491 ] Hadoop QA commented on HBASE-19116: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 38s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 0s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s{color} | {color:red} hbase-server: The patch generated 1 new + 17 unchanged - 3 fixed = 18 total (was 20) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 55s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 14m 36s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 43s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.io.hfile.TestFixedFileTrailer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db | | JIRA Issue | HBASE-19116 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910500/HBASE-19116.branch-2.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux ac4a963f1653 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2 / 4594f7156d | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11516/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11516/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11516/testReport/ | | Max. process+thread count | 664 (vs. ulimit of
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363489#comment-16363489 ] stack commented on HBASE-18294: --- bq. Globally the decision should be with ||. We have barrier for off heap and on heap memory and when any of the barrier is about to be crossed, it will result in forced flushes. That sounds good. I did not get that from reading the release note (Yeah, add names of configs to toggle to Release Note). I'm good w/ this. You [~anoop.hbase] ? > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, > HBASE-18294.master.01.patch, HBASE-18294.master.01.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363488#comment-16363488 ] Hadoop QA commented on HBASE-19996: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1.4 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 32s{color} | {color:green} branch-1.4 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_162 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_171 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} branch-1.4 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 5s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 41s{color} | {color:green} branch-1.4 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_162 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_171 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed with JDK v1.8.0_162 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed with JDK v1.7.0_171 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 36s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 40s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.5 2.7.4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.8.0_162 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed with JDK v1.7.0_171 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 51s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 98m 18s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363477#comment-16363477 ] stack commented on HBASE-19116: --- .003 addresses comments by Anoop up on rb. > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch, > HBASE-19116.branch-2.002.patch, HBASE-19116.branch-2.003.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi
[ https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363464#comment-16363464 ] stack commented on HBASE-19965: --- Here is for the test that timed out, build 314: --- Test set: org.apache.hadoop.hbase.client.TestAsyncTableAdminApi --- Tests run: 30, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 574.271 s <<< FAILURE! - in org.apache.hadoop.hbase.client.TestAsyncTableAdminApi org.apache.hadoop.hbase.client.TestAsyncTableAdminApi Time elapsed: 8.443 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 600 seconds org.apache.hadoop.hbase.client.TestAsyncTableAdminApi Time elapsed: 8.473 s <<< ERROR! java.lang.Exception: Appears to be stuck in thread DataXceiver for client DFSClient_NONMAPREDUCE_1381247601_23 at /127.0.0.1:40966 [Receiving block BP-1735548202-172.17.0.2-1518565636532:blk_1073741829_1005] Parameterized there are about 29 tests. None takes a particularly long time: {code} 1 2018-02-13 23:47:40,557 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testCreateTableWithEmptyRowInTheSplitKeys[0] Thread=302, OpenFileDescriptor=1612, MaxFileDescriptor=1048576, SystemLoadAverage=2007, ProcessCount=17, AvailableMemoryMB=21888 2 2018-02-13 23:47:40,659 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testDeleteTable[0] Thread=305, OpenFileDescriptor=1614, MaxFileDescriptor=1048576, SystemLoadAverage=1982, ProcessCount=17, AvailableMemoryMB=21890 3 2018-02-13 23:47:51,503 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testDisableAndEnableTables[0] Thread=320, OpenFileDescriptor=1616, MaxFileDescriptor=1048576, SystemLoadAverage=1827, ProcessCount=17, AvailableMemoryMB=21940 4 2018-02-13 23:48:32,308 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testCreateTable[0] Thread=344, OpenFileDescriptor=1589, MaxFileDescriptor=1048576, SystemLoadAverage=1798, ProcessCount=17, AvailableMemoryMB=22132 5 2018-02-13 23:48:41,898 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testCreateTableWithRegions[0] Thread=344, OpenFileDescriptor=1586, MaxFileDescriptor=1048576, SystemLoadAverage=1867, ProcessCount=17, AvailableMemoryMB=21816 6 2018-02-13 23:49:23,348 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testIsTableEnabledAndDisabled[0] Thread=435, OpenFileDescriptor=1571, MaxFileDescriptor=1048576, SystemLoadAverage=1732, ProcessCount=17, AvailableMemoryMB=21124 7 2018-02-13 23:49:36,012 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testListTables[0] Thread=441, OpenFileDescriptor=1577, MaxFileDescriptor=1048576, SystemLoadAverage=1809, ProcessCount=17, AvailableMemoryMB=20713 8 2018-02-13 23:50:08,285 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testTruncateTablePreservingSplits[0] Thread=375, OpenFileDescriptor=1567, MaxFileDescriptor=1048576, SystemLoadAverage=1902, ProcessCount=17, AvailableMemoryMB=20604 9 2018-02-13 23:50:26,969 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testCreateTableNumberOfRegions[0] Thread=373, OpenFileDescriptor=1590, MaxFileDescriptor=1048576, SystemLoadAverage=1889, ProcessCount=17, AvailableMemoryMB=20979 10 2018-02-13 23:51:18,514 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testDisableAndEnableTable[0] Thread=439, OpenFileDescriptor=1569, MaxFileDescriptor=1048576, SystemLoadAverage=1876, ProcessCount=17, AvailableMemoryMB=20712 11 2018-02-13 23:51:40,787 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testEnableTableRetainAssignment[0] Thread=430, OpenFileDescriptor=1575, MaxFileDescriptor=1048576, SystemLoadAverage=1754, ProcessCount=17, AvailableMemoryMB=20766 12 2018-02-13 23:51:59,198 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testTruncateTable[0] Thread=473, OpenFileDescriptor=1578, MaxFileDescriptor=1048576, SystemLoadAverage=1826, ProcessCount=17, AvailableMemoryMB=21183 13 2018-02-13 23:52:17,821 INFO [Time-limited test] hbase.ResourceChecker(148): before: client.TestAsyncTableAdminApi#testGetTableDescriptor[0] Thread=446, OpenFileDescriptor=1576, MaxFileDescriptor=1048576, SystemLoadAverage=1827, ProcessCount=17, AvailableMemoryMB=21732 14 2018-02-13 23:52:27,280 INFO [Time-limited test] hbase.ResourceChecker(148): before:
[jira] [Updated] (HBASE-19997) [rolling upgrade] 1.x => 2.x
[ https://issues.apache.org/jira/browse/HBASE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19997: -- Fix Version/s: (was: 2.0.0) 2.0.0-beta-2 > [rolling upgrade] 1.x => 2.x > > > Key: HBASE-19997 > URL: https://issues.apache.org/jira/browse/HBASE-19997 > Project: HBase > Issue Type: Umbrella >Reporter: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > > An umbrella issue of issues needed so folks can do a rolling upgrade from > hbase-1.x to hbase-2.x. > (Recent) Notables: > * hbase-1.x can't read hbase-2.x WALs -- hbase-1.x doesn't know the > AsyncProtobufLogWriter class used writing the WAL -- see > https://issues.apache.org/jira/browse/HBASE-19166?focusedCommentId=16362897=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16362897 > for exception. > ** Might be ok... means WAL split fails on an hbase1 RS... must wait till an > hbase-2.x RS picks up the WAL for it to be split. > * hbase-1 can't open regions from tables created by hbase-2; it can't find > the Table descriptor. See > https://issues.apache.org/jira/browse/HBASE-19116?focusedCommentId=16363276=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16363276 > ** This might be ok if the tables we are doing rolling upgrade over were > written with hbase-1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19997) [rolling upgrade] 1.x => 2.x
stack created HBASE-19997: - Summary: [rolling upgrade] 1.x => 2.x Key: HBASE-19997 URL: https://issues.apache.org/jira/browse/HBASE-19997 Project: HBase Issue Type: Umbrella Reporter: stack Fix For: 2.0.0 An umbrella issue of issues needed so folks can do a rolling upgrade from hbase-1.x to hbase-2.x. (Recent) Notables: * hbase-1.x can't read hbase-2.x WALs -- hbase-1.x doesn't know the AsyncProtobufLogWriter class used writing the WAL -- see https://issues.apache.org/jira/browse/HBASE-19166?focusedCommentId=16362897=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16362897 for exception. ** Might be ok... means WAL split fails on an hbase1 RS... must wait till an hbase-2.x RS picks up the WAL for it to be split. * hbase-1 can't open regions from tables created by hbase-2; it can't find the Table descriptor. See https://issues.apache.org/jira/browse/HBASE-19116?focusedCommentId=16363276=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16363276 ** This might be ok if the tables we are doing rolling upgrade over were written with hbase-1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19166) Add translation for handling hbase.regionserver.wal.WALEdit
[ https://issues.apache.org/jira/browse/HBASE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363458#comment-16363458 ] Anoop Sam John commented on HBASE-19166: Get you.. Would be sweet if the fail reassign do not happen. May be some other changes will be there which wont allow the split to be done by a 1.x RS even if we solve this write/reader name issue? > Add translation for handling hbase.regionserver.wal.WALEdit > --- > > Key: HBASE-19166 > URL: https://issues.apache.org/jira/browse/HBASE-19166 > Project: HBase > Issue Type: Bug > Components: wal >Reporter: Ted Yu >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > > For hlog generated by 1.x, using WALPlayer from hbase2 would result in: > {code} > 2017-11-02 21:22:40,907 INFO [main] mapreduce.Job: Task Id : > attempt_1509641483571_0003_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: > org.apache.hadoop.hbase.regionserver.wal.WALEdit cannot be cast to > org.apache.hadoop.hbase.wal.WALEdit > at > org.apache.hadoop.hbase.mapreduce.WALPlayer$WALCellMapper.map(WALPlayer.java:143) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > {code} > HBASE-16479 relocated WALEdit. > Chatting with Enis, he mentioned adding translation for handling > hbase.regionserver.wal.WALEdit > This way, WAL from 1.x can be recognized by hbase-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19116: -- Attachment: HBASE-19116.branch-2.003.patch > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch, > HBASE-19116.branch-2.002.patch, HBASE-19116.branch-2.003.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19166) Add translation for handling hbase.regionserver.wal.WALEdit
[ https://issues.apache.org/jira/browse/HBASE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363456#comment-16363456 ] stack commented on HBASE-19166: --- bq. When the cluster is a mix of HBase 1 and 2 RSs (upgrade in progress) and one 2.0 RS crashed and the WAL split is been done by a 1.x server? Am I missing any? You are not missing anything. My thought is Master will put up the WAL for splitting, the hbase1 RS will grab it and try to split, fail because it is hbase2... and this will go on until a hbase2 RS grabs the WAL. Meantime, we'll be adding more RS. I think that will work. We need to spend time on it. > Add translation for handling hbase.regionserver.wal.WALEdit > --- > > Key: HBASE-19166 > URL: https://issues.apache.org/jira/browse/HBASE-19166 > Project: HBase > Issue Type: Bug > Components: wal >Reporter: Ted Yu >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > > For hlog generated by 1.x, using WALPlayer from hbase2 would result in: > {code} > 2017-11-02 21:22:40,907 INFO [main] mapreduce.Job: Task Id : > attempt_1509641483571_0003_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: > org.apache.hadoop.hbase.regionserver.wal.WALEdit cannot be cast to > org.apache.hadoop.hbase.wal.WALEdit > at > org.apache.hadoop.hbase.mapreduce.WALPlayer$WALCellMapper.map(WALPlayer.java:143) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > {code} > HBASE-16479 relocated WALEdit. > Chatting with Enis, he mentioned adding translation for handling > hbase.regionserver.wal.WALEdit > This way, WAL from 1.x can be recognized by hbase-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19166) Add translation for handling hbase.regionserver.wal.WALEdit
[ https://issues.apache.org/jira/browse/HBASE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363452#comment-16363452 ] Anoop Sam John commented on HBASE-19166: bq.On an hbase1 splitting hbase2 logs and failing as per the above, that might be ok; That should be an issue no? When the cluster is a mix of HBase 1 and 2 RSs (upgrade in progress) and one 2.0 RS crashed and the WAL split is been done by a 1.x server? > Add translation for handling hbase.regionserver.wal.WALEdit > --- > > Key: HBASE-19166 > URL: https://issues.apache.org/jira/browse/HBASE-19166 > Project: HBase > Issue Type: Bug > Components: wal >Reporter: Ted Yu >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > > For hlog generated by 1.x, using WALPlayer from hbase2 would result in: > {code} > 2017-11-02 21:22:40,907 INFO [main] mapreduce.Job: Task Id : > attempt_1509641483571_0003_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: > org.apache.hadoop.hbase.regionserver.wal.WALEdit cannot be cast to > org.apache.hadoop.hbase.wal.WALEdit > at > org.apache.hadoop.hbase.mapreduce.WALPlayer$WALCellMapper.map(WALPlayer.java:143) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > {code} > HBASE-16479 relocated WALEdit. > Chatting with Enis, he mentioned adding translation for handling > hbase.regionserver.wal.WALEdit > This way, WAL from 1.x can be recognized by hbase-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-19166) Add translation for handling hbase.regionserver.wal.WALEdit
[ https://issues.apache.org/jira/browse/HBASE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363452#comment-16363452 ] Anoop Sam John edited comment on HBASE-19166 at 2/14/18 3:56 AM: - bq.On an hbase1 splitting hbase2 logs and failing as per the above, that might be ok; That should be an issue no? When the cluster is a mix of HBase 1 and 2 RSs (upgrade in progress) and one 2.0 RS crashed and the WAL split is been done by a 1.x server? Am I missing any? was (Author: anoop.hbase): bq.On an hbase1 splitting hbase2 logs and failing as per the above, that might be ok; That should be an issue no? When the cluster is a mix of HBase 1 and 2 RSs (upgrade in progress) and one 2.0 RS crashed and the WAL split is been done by a 1.x server? > Add translation for handling hbase.regionserver.wal.WALEdit > --- > > Key: HBASE-19166 > URL: https://issues.apache.org/jira/browse/HBASE-19166 > Project: HBase > Issue Type: Bug > Components: wal >Reporter: Ted Yu >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > > For hlog generated by 1.x, using WALPlayer from hbase2 would result in: > {code} > 2017-11-02 21:22:40,907 INFO [main] mapreduce.Job: Task Id : > attempt_1509641483571_0003_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: > org.apache.hadoop.hbase.regionserver.wal.WALEdit cannot be cast to > org.apache.hadoop.hbase.wal.WALEdit > at > org.apache.hadoop.hbase.mapreduce.WALPlayer$WALCellMapper.map(WALPlayer.java:143) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > {code} > HBASE-16479 relocated WALEdit. > Chatting with Enis, he mentioned adding translation for handling > hbase.regionserver.wal.WALEdit > This way, WAL from 1.x can be recognized by hbase-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363444#comment-16363444 ] Anoop Sam John commented on HBASE-18294: bq.If the offheap is 100x the onheap in size, and the threshold is set to offheap (100x) + onheap (1x) – i.e. 101x – then what happens when the onheap occupancy exceeds 1x? This is about the per region flush decision boss. Correct me if wrong [~eshcar]. Globally the decision should be with ||. We have barrier for off heap and on heap memory and when any of the barrier is about to be crossed, it will result in forced flushes. > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, > HBASE-18294.master.01.patch, HBASE-18294.master.01.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19852) HBase Thrift 1 server SPNEGO Improvements
[ https://issues.apache.org/jira/browse/HBASE-19852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363441#comment-16363441 ] Kevin Risden commented on HBASE-19852: -- Thanks for the pointers [~carp84]. I've made some good progress on tests for this. I should have a patch up soon. > HBase Thrift 1 server SPNEGO Improvements > - > > Key: HBASE-19852 > URL: https://issues.apache.org/jira/browse/HBASE-19852 > Project: HBase > Issue Type: Improvement > Components: Thrift >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Attachments: HBASE-19852.master.001.patch > > > HBase Thrift1 server has some issues when trying to use SPNEGO. > From mailing list: > http://mail-archives.apache.org/mod_mbox/hbase-user/201801.mbox/%3CCAJU9nmh5YtZ%2BmAQSLo91yKm8pRVzAPNLBU9vdVMCcxHRtRqgoA%40mail.gmail.com%3E > {quote}While setting up the HBase Thrift server with HTTP, there were a > significant amount of 401 errors where the HBase Thrift wasn't able to > handle the incoming Kerberos request. Documentation online is sparse when > it comes to setting up the principal/keytab for HTTP Kerberos. > I noticed that the HBase Thrift HTTP implementation was missing SPNEGO > principal/keytab like other Thrift based servers (HiveServer2). It looks > like HiveServer2 Thrift implementation and HBase Thrift v1 implementation > were very close to the same at one point. I made the following changes to > HBase Thrift v1 server implementation to make it work: > * add SPNEGO principal/keytab if in HTTP mode > * return 401 immediately if no authorization header instead of waiting for > try/catch down in program flow{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363432#comment-16363432 ] Hadoop QA commented on HBASE-19116: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 43s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 6s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s{color} | {color:red} hbase-server: The patch generated 1 new + 17 unchanged - 3 fixed = 18 total (was 20) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 11s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 17m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}126m 34s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}160m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db | | JIRA Issue | HBASE-19116 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910477/HBASE-19116.branch-2.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2be1222e5016 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2 / 4594f7156d | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11513/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11513/testReport/ | | Max. process+thread count | 5026 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11513/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This
[jira] [Updated] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19996: --- Attachment: HBASE-19996.branch-1.4.001.patch > Some nonce procs might not be cleaned up (follow up HBASE-19756) > > > Key: HBASE-19996 > URL: https://issues.apache.org/jira/browse/HBASE-19996 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-19996.branch-1.4.001.patch, > HBASE-19996.branch-1.4.001.patch > > > Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. > Unfortunately, the patch for branch-1 might not remove some valid procs too. > The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and > also adds another test to branch-2. Thanks to [~toffer] for flagging this > internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363378#comment-16363378 ] Appy commented on HBASE-19988: -- Sorry, i don't have time to dig in and come up with a better understanding of handling InterruptedException when processing requests. In this case, since IE was already being converted to IIOE, that means any other operation would have been handling it like IOException, which means cancel the operation. Going by that logic, and status quo bias (that it's already IIOE), i think it might be fine to do this. However, I think it'll be better to handle it as part of IOException by doing {code} if (isAtomic() or ioe instanceof IIOE) { throw ioe; } {code} because it'll log a good warning. Maybe move TimeoutIOException there too. Currently the comment says "// We will retry when other exceptions, but we should stop if we timeout ." Should be updated with reasons why we break out for each type. Let's not leave things in more dismay for future onlookers (why these two? why not others? etc etc). They shouldn't have to spend the time we already did, else our effort is wasted. > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch, > hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17472) Correct the semantic of permission grant
[ https://issues.apache.org/jira/browse/HBASE-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363368#comment-16363368 ] Appy commented on HBASE-17472: -- Sorry for ultra late review. Seeing the final patch which was committed to branch-1.4, the value of flag is always false, and the one which was committed to master, the flag is always true for production code (there are a few false in only test code, but that shouldn't count). Going by that high level picture, it feels like we didn't need to make any change in branch-1.4 since adding a param always setting it to false is a no-op. And for master, only the change to AccessControlLists#addUserPermission would have been sufficient. We didn't need any new param or updating anything else. What am i missing? > Correct the semantic of permission grant > - > > Key: HBASE-17472 > URL: https://issues.apache.org/jira/browse/HBASE-17472 > Project: HBase > Issue Type: Improvement > Components: Admin >Affects Versions: 2.0.0, 1.4.0 >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17472.branch-1.3.v6.patch, > HBASE-17472.branch-1.v6.patch, HBASE-17472.branch-1.v7.patch, > HBASE-17472.master.v6.patch, HBASE-17472.master.v6.patch, > HBASE-17472.master.v7.patch, HBASE-17472.v1.patch, HBASE-17472.v2.patch, > HBASE-17472.v3.patch, HBASE-17472.v4.patch, HBASE-17472.v5.patch > > > Currently, HBase grant operation has following semantic: > {code} > hbase(main):019:0> grant 'hbase_tst', 'RW', 'ycsb' > 0 row(s) in 0.0960 seconds > hbase(main):020:0> user_permission 'ycsb' > User > Namespace,Table,Family,Qualifier:Permission > > > > hbase_tst default,ycsb,,: > [Permission:actions=READ,WRITE] > > > 1 row(s) in 0.0550 seconds > hbase(main):021:0> grant 'hbase_tst', 'CA', 'ycsb' > 0 row(s) in 0.0820 seconds > hbase(main):022:0> user_permission 'ycsb' > User > Namespace,Table,Family,Qualifier:Permission > > > hbase_tst default,ycsb,,: > [Permission: actions=CREATE,ADMIN] > > > 1 row(s) in 0.0490 seconds > {code} > Later permission will replace previous granted permissions, which confused > most of HBase administrator. > It's seems more reasonable that HBase merge multiple granted permission. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi
[ https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19965. --- Resolution: Fixed This fell off the flakies list. The change in TestAsyncTableAdminApi is not enough... https://builds.apache.org/job/HBase%20Nightly/job/branch-2/314/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncTableAdminApi/org_apache_hadoop_hbase_client_TestAsyncTableAdminApi/ Let me move some more over to TestAsyncTableAdminApi2 or make a TestAsyncTableAdminApi3. > Fix flaky TestAsyncRegionAdminApi > - > > Key: HBASE-19965 > URL: https://issues.apache.org/jira/browse/HBASE-19965 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19965.branch-2.001.patch > > > See > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/] > > java.lang.AssertionError: expected:<2> but was:<3> at > org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359) > > Merge regions not work. The table still have 3 regions after the > MergeRegionsProcedure finished. > The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because > the MergeRegionsProcedure pid=138 start work first, so the balance need wait > for the lock. But after merge regions finished, the MoveRegionProcedure > pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new > region server. This is not right. The MoveRegionProcedure should skip to > assign a region which was marked as offline. Or we should clear the merged > regions' procedure when MergeRegionsProcedure finished. > > Logs: > 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] > master.HMaster(1454): balance > hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., > source=cd4730e3eae2,39077,1518106776411, > destination=cd4730e3eae2,40578,1518106776318 > 2018-02-08 16:24:44,608 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] > procedure2.ProcedureExecutor(868): Stored pid=138, > state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure > table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, > 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false > .. > 2018-02-08 16:24:50,111 INFO [PEWorker-13] > procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; > MergeTableRegionsProcedure table=testMergeRegions, > regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], > forcibly=false in 5.5710sec > 2018-02-08 16:24:50,113 INFO [PEWorker-13] > procedure.MasterProcedureScheduler(813): pid=139, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., > source=cd4730e3eae2,39077,1518106776411, > destination=cd4730e3eae2,40578,1518106776318 testMergeRegions > testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363356#comment-16363356 ] Hadoop QA commented on HBASE-19996: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 4m 44s{color} | {color:red} Docker failed to build yetus/hbase:74e3133. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-19996 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910485/HBASE-19996.branch-1.4.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11514/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Some nonce procs might not be cleaned up (follow up HBASE-19756) > > > Key: HBASE-19996 > URL: https://issues.apache.org/jira/browse/HBASE-19996 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-19996.branch-1.4.001.patch > > > Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. > Unfortunately, the patch for branch-1 might not remove some valid procs too. > The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and > also adds another test to branch-2. Thanks to [~toffer] for flagging this > internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-19996: - Description: Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. Unfortunately, the patch for branch-1 might not remove some valid procs too. The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and also adds another test to branch-2. Thanks to [~toffer] for flagging this internally. (was: Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. Unfortunately, the patch for branch-1 might not remove some valid procs too. The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and also adds another test to branch-2.) > Some nonce procs might not be cleaned up (follow up HBASE-19756) > > > Key: HBASE-19996 > URL: https://issues.apache.org/jira/browse/HBASE-19996 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-19996.branch-1.4.001.patch > > > Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. > Unfortunately, the patch for branch-1 might not remove some valid procs too. > The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and > also adds another test to branch-2. Thanks to [~toffer] for flagging this > internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-19996: - Status: Patch Available (was: Open) > Some nonce procs might not be cleaned up (follow up HBASE-19756) > > > Key: HBASE-19996 > URL: https://issues.apache.org/jira/browse/HBASE-19996 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-19996.branch-1.4.001.patch > > > Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. > Unfortunately, the patch for branch-1 might not remove some valid procs too. > The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and > also adds another test to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-19996: - Attachment: HBASE-19996.branch-1.4.001.patch > Some nonce procs might not be cleaned up (follow up HBASE-19756) > > > Key: HBASE-19996 > URL: https://issues.apache.org/jira/browse/HBASE-19996 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > Attachments: HBASE-19996.branch-1.4.001.patch > > > Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. > Unfortunately, the patch for branch-1 might not remove some valid procs too. > The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and > also adds another test to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19756) Master NPE during completed failed proc eviction
[ https://issues.apache.org/jira/browse/HBASE-19756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363345#comment-16363345 ] Thiruvel Thirumoolan commented on HBASE-19756: -- [~apurtell]/[~yuzhih...@gmail.com] - The master patch here is fine, I wanted to rework on branch-1 patch, but fell sick and patch got committed within that. Raised HBASE-19996 as a followup to fix the problem with branch-1 patch. > Master NPE during completed failed proc eviction > > > Key: HBASE-19756 > URL: https://issues.apache.org/jira/browse/HBASE-19756 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0, 1.3.1 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.0.0, 3.0.0, 1.3.2, 1.4.1, 1.5.0 > > Attachments: HBASE-19756.branch-1.4.001.patch, > HBASE-19756.branch-1.4.002.patch, HBASE-19756.branch-1.4.003.patch, > HBASE-19756.master.001.patch > > > When procedures like Create table fails due to say AccessDeniedException, > then a rollback procedure is created. When the rollback is being cleaned up, > it results in an NPE because those nonce procs aren't persisted > Stack trace when this happens: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:385) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:547) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:504) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:453) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$CompletedProcedureCleaner.periodicExecute(ProcedureExecutor.java:184) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.timeoutLoop(ProcedureExecutor.java:995) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$500(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$3.run(ProcedureExecutor.java:507) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
[ https://issues.apache.org/jira/browse/HBASE-19996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-19996: - Fix Version/s: 1.4.2 1.5.0 1.3.2 2.0.0 > Some nonce procs might not be cleaned up (follow up HBASE-19756) > > > Key: HBASE-19996 > URL: https://issues.apache.org/jira/browse/HBASE-19996 > Project: HBase > Issue Type: Bug >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.0.0, 1.3.2, 1.5.0, 1.4.2 > > > Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. > Unfortunately, the patch for branch-1 might not remove some valid procs too. > The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and > also adds another test to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19996) Some nonce procs might not be cleaned up (follow up HBASE-19756)
Thiruvel Thirumoolan created HBASE-19996: Summary: Some nonce procs might not be cleaned up (follow up HBASE-19756) Key: HBASE-19996 URL: https://issues.apache.org/jira/browse/HBASE-19996 Project: HBase Issue Type: Bug Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Follow up to HBASE-19756 which dealt with NPEs during proc cleanup. Unfortunately, the patch for branch-1 might not remove some valid procs too. The branch-2 patch doesn't have this problem. This fixes the branch-1 bug and also adds another test to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363338#comment-16363338 ] stack commented on HBASE-19988: --- bq. Been reading around code for last 30 min, i honestly have no idea how are we supposed to interpret InterruptedException. IE handling is erratic. Some code lines are non-interruptible (HDFS, client retries...). Generally, if you have an IE and don't know what to do w/ it, do clean up, set interrupt on thread and rethrow. A good project would be going though the codebase throwing IEs to see what happens. > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch, > hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363330#comment-16363330 ] Umesh Agashe commented on HBASE-19988: -- As surefire is able to interrupt tests suggests that InterruptedException is not ignored always/ from everywhere. > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch, > hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363328#comment-16363328 ] stack commented on HBASE-18294: --- On release note, named the configs to set? On the patch, couldn't the memstoreSize change... between leaving the synchronize block and going in here to do the check? checkNegativeMemStoreDataSize(size, -memStoreSize.getDataSize()); Copy the datasize to a local variable inside the sync block? Or nvm... I see that we are passing in the passed-in param, not the data member content. In that case, its confusing have param same name as a data member. Can lead to confusion. We are doing this... 1263 public long getMemStoreDataSize() { 1264return memStoreSize.getDataSize(); 1265 } .. w/o a synchronize. Should there be one? ... Hmm... No, it should be ok. It is a volatile read. Ignore. Interesting, so looking for best region to flush, we'll do data size... 176 (regionToFlush != null && regionToFlush.getMemStoreDataSize() > 0) || 177 (bestRegionReplica != null && bestRegionReplica.getMemStoreDataSize() > 0)); The data size accounting is just a nice-to-have in the scheme of things? (A vestige held over from the back and forth here). This is right? 91long getMemStoreSize() { 92 return region.getMemStoreSize();92 return region.getMemStoreDataSize(); ... i.e. returing data size when we ask for memstoresize? (We also have a getMemStoreDataSize ...) Did a pass. Looks good to me. > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, > HBASE-18294.master.01.patch, HBASE-18294.master.01.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363326#comment-16363326 ] Appy commented on HBASE-19988: -- Been reading around code for last 30 min, i honestly have no idea how are we supposed to interpret InterruptedException. > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch, > hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19981) Boolean#getBoolean is used to parse value
[ https://issues.apache.org/jira/browse/HBASE-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363291#comment-16363291 ] Hudson commented on HBASE-19981: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #349 (See [https://builds.apache.org/job/HBase-1.3-IT/349/]) HBASE-19981 Boolean#getBoolean is used to parse value (tedyu: rev e6dda8ea6db4e50e3bc3e93a72dc06f433a75b58) * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java > Boolean#getBoolean is used to parse value > - > > Key: HBASE-19981 > URL: https://issues.apache.org/jira/browse/HBASE-19981 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Janos Gub >Priority: Major > Fix For: 1.3.2, 1.2.7, 1.4.2 > > Attachments: HBASE-19981.branch-1.001.patch > > > In HColumnDescriptor of branch-1: > {code} > value.set(Bytes.toBytes( > Boolean.getBoolean(Bytes.toString(value.get())) > {code} > According to > https://docs.oracle.com/javase/7/docs/api/java/lang/Boolean.html#getBoolean(java.lang.String): > {code} > Returns true if and only if the system property named by the argument exists > and is equal to the string "true" > {code} > This was not the intention of the quoted code. > This was discovered by Fortify. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19981) Boolean#getBoolean is used to parse value
[ https://issues.apache.org/jira/browse/HBASE-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363285#comment-16363285 ] Hudson commented on HBASE-19981: SUCCESS: Integrated in Jenkins build HBase-1.2-IT #1069 (See [https://builds.apache.org/job/HBase-1.2-IT/1069/]) HBASE-19981 Boolean#getBoolean is used to parse value (tedyu: rev 0f3bf54899e4d8927f76f9e9515e774590ad56eb) * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java > Boolean#getBoolean is used to parse value > - > > Key: HBASE-19981 > URL: https://issues.apache.org/jira/browse/HBASE-19981 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Janos Gub >Priority: Major > Fix For: 1.3.2, 1.2.7, 1.4.2 > > Attachments: HBASE-19981.branch-1.001.patch > > > In HColumnDescriptor of branch-1: > {code} > value.set(Bytes.toBytes( > Boolean.getBoolean(Bytes.toString(value.get())) > {code} > According to > https://docs.oracle.com/javase/7/docs/api/java/lang/Boolean.html#getBoolean(java.lang.String): > {code} > Returns true if and only if the system property named by the argument exists > and is equal to the string "true" > {code} > This was not the intention of the quoted code. > This was discovered by Fortify. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19981) Boolean#getBoolean is used to parse value
[ https://issues.apache.org/jira/browse/HBASE-19981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19981: --- Fix Version/s: 1.2.7 1.3.2 > Boolean#getBoolean is used to parse value > - > > Key: HBASE-19981 > URL: https://issues.apache.org/jira/browse/HBASE-19981 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Janos Gub >Priority: Major > Fix For: 1.3.2, 1.2.7, 1.4.2 > > Attachments: HBASE-19981.branch-1.001.patch > > > In HColumnDescriptor of branch-1: > {code} > value.set(Bytes.toBytes( > Boolean.getBoolean(Bytes.toString(value.get())) > {code} > According to > https://docs.oracle.com/javase/7/docs/api/java/lang/Boolean.html#getBoolean(java.lang.String): > {code} > Returns true if and only if the system property named by the argument exists > and is equal to the string "true" > {code} > This was not the intention of the quoted code. > This was discovered by Fortify. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19116: -- Attachment: HBASE-19116.branch-2.002.patch > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch, > HBASE-19116.branch-2.002.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363280#comment-16363280 ] stack commented on HBASE-19116: --- .002 Checkstyle fixes. Review please. > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch, > HBASE-19116.branch-2.002.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363276#comment-16363276 ] stack commented on HBASE-19116: --- Here is reading a hbase2 hfile with an hbase1 reader: stack@ve0524:~$ ./hbase/bin/hbase --config ~/conf_hbase/ org.apache.hadoop.hbase.io.hfile.HFile --printmeta -f /hbase/archive/data/default/IntegrationTestBigLinkedList/25eb09e8ddb00ea240407061e776a289/big/8e54a03ba0c14e458c57290f0b25373d 2018-02-13 16:02:54,864 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-02-13 16:02:55,284 INFO [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled Block index size as per heapsize: 53152 reader=/hbase/archive/data/default/IntegrationTestBigLinkedList/25eb09e8ddb00ea240407061e776a289/big/8e54a03ba0c14e458c57290f0b25373d, compression=none, cacheConf=CacheConfig:disabled, firstKey=\xC7\x1Cr((?\x0E$\x1F\xAF\x966%1/big:big/1518565482300/Put, lastKey=\xD5Vr_\x13\x9C\x10]\xAE\x19\xDE_9\x1A] Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363277#comment-16363277 ] stack commented on HBASE-19116: --- Need a review here please. > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18294) Reduce global heap pressure: flush based on heap occupancy
[ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363247#comment-16363247 ] stack commented on HBASE-18294: --- Nice release note [~eshcar]. bq. (2) A region is flushed when its on-heap+off-heap size exceeds the region flush threshold, If the offheap is 100x the onheap in size, and the threshold is set to offheap (100x) + onheap (1x) -- i.e. 101x -- then what happens when the onheap occupancy exceeds 1x? Left feedback on RB. Thanks. > Reduce global heap pressure: flush based on heap occupancy > -- > > Key: HBASE-18294 > URL: https://issues.apache.org/jira/browse/HBASE-18294 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.01.patch, > HBASE-18294.01.patch, HBASE-18294.01.patch, HBASE-18294.02.patch, > HBASE-18294.03.patch, HBASE-18294.04.patch, HBASE-18294.05.patch, > HBASE-18294.06.patch, HBASE-18294.07.patch, HBASE-18294.07.patch, > HBASE-18294.08.patch, HBASE-18294.09.patch, HBASE-18294.10.patch, > HBASE-18294.11.patch, HBASE-18294.11.patch, HBASE-18294.12.patch, > HBASE-18294.13.patch, HBASE-18294.15.patch, HBASE-18294.16.patch, > HBASE-18294.master.01.patch, HBASE-18294.master.01.patch > > > A region is flushed if its memory component exceed a threshold (default size > is 128MB). > A flush policy decides whether to flush a store by comparing the size of the > store to another threshold (that can be configured with > hbase.hregion.percolumnfamilyflush.size.lower.bound). > Currently the implementation (in both cases) compares the data size > (key-value only) to the threshold where it should compare the heap size > (which includes index size, and metadata). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19972) Should rethrow the RetriesExhaustedWithDetailsException when failed to apply the batch in ReplicationSink
[ https://issues.apache.org/jira/browse/HBASE-19972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363245#comment-16363245 ] Andrew Purtell commented on HBASE-19972: [~Apache9] Sure, we can do a 1.4 release this month instead of waiting until next month. Will start on it today, expect a vote by/for next week. > Should rethrow the RetriesExhaustedWithDetailsException when failed to apply > the batch in ReplicationSink > -- > > Key: HBASE-19972 > URL: https://issues.apache.org/jira/browse/HBASE-19972 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Critical > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19972-branch-1.4.patch, HBASE-19972.v1.patch, > HBASE-19972.v1.patch > > > As [~Apache9] said in HBASE-12091. > In ReplicationSink#batch,we swallow the RetriesExhaustedWithDetailsException > except > TableNotFoundException, actually, should rethrow the exception. > {code:java} > try { > Connection connection = getConnection(); > table = connection.getTable(tableName); > for (List rows : allRows) { > table.batch(rows); > } > } catch (RetriesExhaustedWithDetailsException rewde) { > for (Throwable ex : rewde.getCauses()) { > if (ex instanceof TableNotFoundException) { > throw new TableNotFoundException("'"+tableName+"'"); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363234#comment-16363234 ] Andrew Purtell commented on HBASE-18282: Hi [~benlau], yes, please, and thank you in advance. > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Priority: Critical > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reassigned HBASE-18282: -- Assignee: (was: Ashu Pachauri) > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Priority: Critical > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363228#comment-16363228 ] Hadoop QA commented on HBASE-19988: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 46s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 38s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}105m 31s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19988 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910445/hbase-19988.master.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d6a29d121b63 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 39e191e559 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11512/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11512/testReport/ | | Max. process+thread count | 5346 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | |
[jira] [Commented] (HBASE-19995) Current Jetty 9 version in HBase master branch can memory leak under high traffic
[ https://issues.apache.org/jira/browse/HBASE-19995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363226#comment-16363226 ] Ted Yu commented on HBASE-19995: Updating to 9.3.22.v20171030 is good. > Current Jetty 9 version in HBase master branch can memory leak under high > traffic > - > > Key: HBASE-19995 > URL: https://issues.apache.org/jira/browse/HBASE-19995 > Project: HBase > Issue Type: Bug > Components: REST >Affects Versions: 2.0 >Reporter: Ben Lau >Priority: Major > > There is a memory-leak in Jetty 9 that manifests whenever you hit the call > queue limit in HBase REST. The memory-leak leaks both on-heap and off-heap > objects permanently. It happens because whenever the call queue for Jetty > server overflows, the task that is rejected runs a 'reject' method if it is a > Rejectable to do any cleanup. This clean up is necessary to for example close > the connection, deallocate any buffers, etc. Unfortunately, in Jetty 9, they > implemented the 'reject' / cleanup method of the SelectChannelEndpoint as a > non-blocking call that is not guaranteed to run. This was later fixed in > Jetty 9.4 and later backported however the version of Jetty 9 pulled in HBase > for REST comes before this fix. See > [https://github.com/eclipse/jetty.project/issues/1804] and > [https://github.com/apache/hbase/blob/master/pom.xml#L1416.] > If we want to stay on 9.3.X we could update to > [9.3.22.v20171030|https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-server/9.3.22.v20171030] > which is the latest version of 9.3. Thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19995) Current Jetty 9 version in HBase master branch can memory leak under high traffic
Ben Lau created HBASE-19995: --- Summary: Current Jetty 9 version in HBase master branch can memory leak under high traffic Key: HBASE-19995 URL: https://issues.apache.org/jira/browse/HBASE-19995 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0 Reporter: Ben Lau There is a memory-leak in Jetty 9 that manifests whenever you hit the call queue limit in HBase REST. The memory-leak leaks both on-heap and off-heap objects permanently. It happens because whenever the call queue for Jetty server overflows, the task that is rejected runs a 'reject' method if it is a Rejectable to do any cleanup. This clean up is necessary to for example close the connection, deallocate any buffers, etc. Unfortunately, in Jetty 9, they implemented the 'reject' / cleanup method of the SelectChannelEndpoint as a non-blocking call that is not guaranteed to run. This was later fixed in Jetty 9.4 and later backported however the version of Jetty 9 pulled in HBase for REST comes before this fix. See [https://github.com/eclipse/jetty.project/issues/1804] and [https://github.com/apache/hbase/blob/master/pom.xml#L1416.] If we want to stay on 9.3.X we could update to [9.3.22.v20171030|https://mvnrepository.com/artifact/org.eclipse.jetty/jetty-server/9.3.22.v20171030] which is the latest version of 9.3. Thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19992) Hole in namespace table assign
[ https://issues.apache.org/jira/browse/HBASE-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363212#comment-16363212 ] stack commented on HBASE-19992: --- This might be my fault going between hbase1 and hbase2 with different codebases. Leaving open for now. I thought it was the migration of hbase1 table state from zk setting table as enabled and so 'existing' but something else happened such that hbase:meta had no hbase:namespace mention. Leaving open for now in case I see this again in testing. > Hole in namespace table assign > -- > > Key: HBASE-19992 > URL: https://issues.apache.org/jira/browse/HBASE-19992 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Major > > If the assign fails before it comes up in a Master initialization, the table > will have been created and may even be marked ENABLED successfully, but on > restart, we don't assign the table. > Manifest is: > {code} > 2018-02-13 11:45:24,504 ERROR [master/ve0524:16000] master.HMaster: Failed to > become active master > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1052) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:916) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:555) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Timedout 30ms waiting for namespace table > to be assigned and enabled: ENABLED > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1050) > ... 4 more > 2018-02-13 11:45:24,506 ERROR [master/ve0524:16000] master.HMaster: Master > server abort: loaded coprocessors are: > [org.apache.hadoop.hbase.security.access.AccessController] > 2018-02-13 11:45:24,506 ERROR [master/ve0524:16000] master.HMaster: * > ABORTING master ve0524.halxg.cloudera.com,16000,1518550812400: Unhandled > exception. Starting shutdown. * > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1052) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:916) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:555) > at java.lang.Thread.run(Thread.java:748) > > > Caused by: > java.io.IOException: Timedout 30ms waiting for namespace table to be > assigned and enabled: ENABLED > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > > > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1050) > ... 4 more > {code} > Last thing in log before Master crash was: > 2018-02-13 11:34:17,084 INFO [master/ve0524:16000] hbase.MetaTableAccessor: > Updated table hbase:namespace state to ENABLED in META > There is no one doing an assign subsequent to initial create table. -- This
[jira] [Commented] (HBASE-18282) ReplicationLogCleaner can delete WALs not yet replicated in case of a KeeperException
[ https://issues.apache.org/jira/browse/HBASE-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363168#comment-16363168 ] Ben Lau commented on HBASE-18282: - Hi guys, this ticket has been open for a while. Do you mind if we submit an internal patch + test we have for this? > ReplicationLogCleaner can delete WALs not yet replicated in case of a > KeeperException > - > > Key: HBASE-18282 > URL: https://issues.apache.org/jira/browse/HBASE-18282 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1 >Reporter: Ashu Pachauri >Assignee: Ashu Pachauri >Priority: Critical > > ReplicationStateZKBase#getListOfReplicators does not rethrow a > KeeperException and returns null in such a case. ReplicationLogCleaner just > assumes that there are no replicators and deletes everything. > ReplicationStateZKBase: > {code:java} > public List getListOfReplicators() { > List result = null; > try { > result = ZKUtil.listChildrenNoWatch(this.zookeeper, this.queuesZNode); > } catch (KeeperException e) { > this.abortable.abort("Failed to get list of replicators", e); > } > return result; > } > {code} > ReplicationLogCleaner: > {code:java} > private Set loadWALsFromQueues() throws KeeperException { > for (int retry = 0; ; retry++) { > int v0 = replicationQueues.getQueuesZNodeCversion(); > List rss = replicationQueues.getListOfReplicators(); > if (rss == null) { > LOG.debug("Didn't find any region server that replicates, won't > prevent any deletions."); > return ImmutableSet.of(); > } > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19991) lots of hbase-rest test failures against hadoop 3
[ https://issues.apache.org/jira/browse/HBASE-19991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated HBASE-19991: -- Attachment: HBASE-19991.WIP.patch > lots of hbase-rest test failures against hadoop 3 > - > > Key: HBASE-19991 > URL: https://issues.apache.org/jira/browse/HBASE-19991 > Project: HBase > Issue Type: Bug > Components: REST, test >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-19991.WIP.patch > > > mvn clean test -pl hbase-rest -Dhadoop.profile=3.0 > [ERROR] Tests run: 106, Failures: 95, Errors: 8, Skipped: 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19991) lots of hbase-rest test failures against hadoop 3
[ https://issues.apache.org/jira/browse/HBASE-19991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363164#comment-16363164 ] Mike Drob commented on HBASE-19991: --- This is failing due to loading jersey-1 classes via hadoop in the hadoop-3 configuration. This patch is my WIP, but I don't see anything jersey-1 left in dependency:tree report. > lots of hbase-rest test failures against hadoop 3 > - > > Key: HBASE-19991 > URL: https://issues.apache.org/jira/browse/HBASE-19991 > Project: HBase > Issue Type: Bug > Components: REST, test >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: 2.0.0 > > > mvn clean test -pl hbase-rest -Dhadoop.profile=3.0 > [ERROR] Tests run: 106, Failures: 95, Errors: 8, Skipped: 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15911) NPE in AssignmentManager.onRegionTransition after Master restart
[ https://issues.apache.org/jira/browse/HBASE-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363161#comment-16363161 ] Ben Lau commented on HBASE-15911: - [~pankaj2461] [~mantonov] We recently ran into this and had to fix this as it was preventing our master from starting up. We would like to submit a suggested fix and test case if you guys do not have a patch yet. > NPE in AssignmentManager.onRegionTransition after Master restart > > > Key: HBASE-15911 > URL: https://issues.apache.org/jira/browse/HBASE-15911 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Assignee: Mikhail Antonov >Priority: Major > > 16/05/27 17:49:18 ERROR ipc.RpcServer: Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.master.AssignmentManager.onRegionTransition(AssignmentManager.java:4364) > at > org.apache.hadoop.hbase.master.MasterRpcServices.reportRegionStateTransition(MasterRpcServices.java:1421) > at > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8623) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2239) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:116) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:137) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:112) > at java.lang.Thread.run(Thread.java:745) > I'm pretty sure I've seen it before and more than once, but never got to dig > in. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19993) Publish tests jar for hbase-zookeeper in bin tarball
[ https://issues.apache.org/jira/browse/HBASE-19993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363158#comment-16363158 ] Appy commented on HBASE-19993: -- Ping [~Apache9], [~stack] > Publish tests jar for hbase-zookeeper in bin tarball > > > Key: HBASE-19993 > URL: https://issues.apache.org/jira/browse/HBASE-19993 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy >Priority: Major > > Since {{HBTU extends HBZKTU}} (such short forms! i know!), we need to publish > hbase-zookeeper's tests jar too. Many IT tests use HBTU. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-19994) Create a new class for RPC throttling exception, make it retryable.
[ https://issues.apache.org/jira/browse/HBASE-19994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huaxiang sun reassigned HBASE-19994: Assignee: huaxiang sun > Create a new class for RPC throttling exception, make it retryable. > > > Key: HBASE-19994 > URL: https://issues.apache.org/jira/browse/HBASE-19994 > Project: HBase > Issue Type: Improvement >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > > Based on a discussion at dev mailing list. > > {code:java} > Thanks Andrew. > +1 for the second option, I will create a jira for this change. > Huaxiang > On Feb 9, 2018, at 1:09 PM, Andrew Purtellwrote: > We have > public class ThrottlingException extends QuotaExceededException > public class QuotaExceededException extends DoNotRetryIOException > Let the storage quota limits throw QuotaExceededException directly (based > on DNRIOE). That seems fine. > However, ThrottlingException is thrown as a result of a temporal quota, > so it is inappropriate for this to inherit from DNRIOE, it should inherit > IOException instead so the client is allowed to retry until successful, or > until the retry policy is exhausted. > We are in a bit of a pickle because we've released with this inheritance > hierarchy, so to change it we will need a new minor, or we will want to > deprecate ThrottlingException and use a new exception class instead, one > which does not inherit from DNRIOE. > On Feb 7, 2018, at 9:25 AM, Huaxiang Sun wrote: > Hi Mike, > You are right. For rpc throttling, definitely it is retryable. For storage > quota, I think it will be fail faster (non-retryable). > We probably need to separate these two types of exceptions, I will do some > more research and follow up. > Thanks, > Huaxiang > On Feb 7, 2018, at 9:16 AM, Mike Drob wrote: > I think, philosophically, there can be two kinds of QEE - > For throttling, we can retry. The quota is a temporal quota - you have done > too many operations this minute, please try again next minute and > everything will work. > For storage, we shouldn't retry. The quota is a fixed quote - you have > exceeded your allotted disk space, please do not try again until you have > remedied the situation. > Our current usage conflates the two, sometimes it is correct, sometimes not. > On Wed, Feb 7, 2018 at 11:00 AM, Huaxiang Sun wrote: > Hi Stack, > I run into a case that a mapreduce job in hive cannot finish because > it runs into a QEE. > I need to look into the hive mr task to see if QEE is not handled > correctly in hbase code or in hive code. > I am thinking that if QEE is a retryable exception, then it should be > taken care of by the hbase code. > I will check more and report back. > Thanks, > Huaxiang > On Feb 7, 2018, at 8:23 AM, Stack wrote: > QEE being a DNRIOE seems right on the face of it. > But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a > throttling scenario Huaxiang? > Thanks, > S > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19993) Publish tests jar for hbase-zookeeper in bin tarball
[ https://issues.apache.org/jira/browse/HBASE-19993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363155#comment-16363155 ] Appy commented on HBASE-19993: -- eh? it's there in beta-1 bin tarball. How? Even though we are not copying it explicitly like other tests jar ([https://github.com/apache/hbase/blob/branch-2/hbase-assembly/src/main/assembly/components.xml#L110]) > Publish tests jar for hbase-zookeeper in bin tarball > > > Key: HBASE-19993 > URL: https://issues.apache.org/jira/browse/HBASE-19993 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy >Priority: Major > > Since {{HBTU extends HBZKTU}} (such short forms! i know!), we need to publish > hbase-zookeeper's tests jar too. Many IT tests use HBTU. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19994) Create a new class for RPC throttling exception, make it retryable.
huaxiang sun created HBASE-19994: Summary: Create a new class for RPC throttling exception, make it retryable. Key: HBASE-19994 URL: https://issues.apache.org/jira/browse/HBASE-19994 Project: HBase Issue Type: Improvement Reporter: huaxiang sun Based on a discussion at dev mailing list. {code:java} Thanks Andrew. +1 for the second option, I will create a jira for this change. Huaxiang On Feb 9, 2018, at 1:09 PM, Andrew Purtellwrote: We have public class ThrottlingException extends QuotaExceededException public class QuotaExceededException extends DoNotRetryIOException Let the storage quota limits throw QuotaExceededException directly (based on DNRIOE). That seems fine. However, ThrottlingException is thrown as a result of a temporal quota, so it is inappropriate for this to inherit from DNRIOE, it should inherit IOException instead so the client is allowed to retry until successful, or until the retry policy is exhausted. We are in a bit of a pickle because we've released with this inheritance hierarchy, so to change it we will need a new minor, or we will want to deprecate ThrottlingException and use a new exception class instead, one which does not inherit from DNRIOE. On Feb 7, 2018, at 9:25 AM, Huaxiang Sun wrote: Hi Mike, You are right. For rpc throttling, definitely it is retryable. For storage quota, I think it will be fail faster (non-retryable). We probably need to separate these two types of exceptions, I will do some more research and follow up. Thanks, Huaxiang On Feb 7, 2018, at 9:16 AM, Mike Drob wrote: I think, philosophically, there can be two kinds of QEE - For throttling, we can retry. The quota is a temporal quota - you have done too many operations this minute, please try again next minute and everything will work. For storage, we shouldn't retry. The quota is a fixed quote - you have exceeded your allotted disk space, please do not try again until you have remedied the situation. Our current usage conflates the two, sometimes it is correct, sometimes not. On Wed, Feb 7, 2018 at 11:00 AM, Huaxiang Sun wrote: Hi Stack, I run into a case that a mapreduce job in hive cannot finish because it runs into a QEE. I need to look into the hive mr task to see if QEE is not handled correctly in hbase code or in hive code. I am thinking that if QEE is a retryable exception, then it should be taken care of by the hbase code. I will check more and report back. Thanks, Huaxiang On Feb 7, 2018, at 8:23 AM, Stack wrote: QEE being a DNRIOE seems right on the face of it. But if throttling, a DNRIOE is inappropriate. Where you seeing a QEE in a throttling scenario Huaxiang? Thanks, S {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19924) hbase rpc throttling does not work for multi() with request count rater.
[ https://issues.apache.org/jira/browse/HBASE-19924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363146#comment-16363146 ] huaxiang sun commented on HBASE-19924: -- I tested the fix and it worked as expected. The client code needs to be updated a bit to handle ThrottlingException so the client will retry. Expect a new patch, thanks. > hbase rpc throttling does not work for multi() with request count rater. > > > Key: HBASE-19924 > URL: https://issues.apache.org/jira/browse/HBASE-19924 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 1.2.6, 2.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Major > Attachments: HBASE-19924-master-v001.patch > > > Basically, rpc throttling does not work for request count based rater for > multi. for the following code, when it calls limiter's checkQuota(), > numWrites/numReads is lost. > {code:java} > @Override > public void checkQuota(int numWrites, int numReads, int numScans) throws > ThrottlingException { > writeConsumed = estimateConsume(OperationType.MUTATE, numWrites, 100); > readConsumed = estimateConsume(OperationType.GET, numReads, 100); > readConsumed += estimateConsume(OperationType.SCAN, numScans, 1000); > writeAvailable = Long.MAX_VALUE; > readAvailable = Long.MAX_VALUE; > for (final QuotaLimiter limiter : limiters) { > if (limiter.isBypass()) continue; > limiter.checkQuota(writeConsumed, readConsumed); > readAvailable = Math.min(readAvailable, limiter.getReadAvailable()); > writeAvailable = Math.min(writeAvailable, limiter.getWriteAvailable()); > } > for (final QuotaLimiter limiter : limiters) { > limiter.grabQuota(writeConsumed, readConsumed); > } > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19993) Publish tests jar for hbase-zookeeper in bin tarball
Appy created HBASE-19993: Summary: Publish tests jar for hbase-zookeeper in bin tarball Key: HBASE-19993 URL: https://issues.apache.org/jira/browse/HBASE-19993 Project: HBase Issue Type: Bug Reporter: Appy Assignee: Appy Since {{HBTU extends HBZKTU}} (such short forms! i know!), we need to publish hbase-zookeeper's tests jar too. Many IT tests use HBTU. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19992) Hole in namespace table assign
[ https://issues.apache.org/jira/browse/HBASE-19992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19992: -- Description: If the assign fails before it comes up in a Master initialization, the table will have been created and may even be marked ENABLED successfully, but on restart, we don't assign the table. Manifest is: {code} 2018-02-13 11:45:24,504 ERROR [master/ve0524:16000] master.HMaster: Failed to become active master java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1052) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:916) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Timedout 30ms waiting for namespace table to be assigned and enabled: ENABLED at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107) at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1050) ... 4 more 2018-02-13 11:45:24,506 ERROR [master/ve0524:16000] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController] 2018-02-13 11:45:24,506 ERROR [master/ve0524:16000] master.HMaster: * ABORTING master ve0524.halxg.cloudera.com,16000,1518550812400: Unhandled exception. Starting shutdown. * java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1052) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:916) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Timedout 30ms waiting for namespace table to be assigned and enabled: ENABLED at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107) at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1050) ... 4 more {code} Last thing in log before Master crash was: 2018-02-13 11:34:17,084 INFO [master/ve0524:16000] hbase.MetaTableAccessor: Updated table hbase:namespace state to ENABLED in META There is no one doing an assign subsequent to initial create table. was: If the assign fails before it comes up in a Master initialization, the table will have been created and may even be marked ENABLED successfully, but on restart, we don't assign the table. Manifest is: {code} 2018-02-13 11:45:24,504 ERROR [master/ve0524:16000] master.HMaster: Failed to become active master java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1052) at
[jira] [Created] (HBASE-19992) Hole in namespace table assign
stack created HBASE-19992: - Summary: Hole in namespace table assign Key: HBASE-19992 URL: https://issues.apache.org/jira/browse/HBASE-19992 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack If the assign fails before it comes up in a Master initialization, the table will have been created and may even be marked ENABLED successfully, but on restart, we don't assign the table. Manifest is: {code} 2018-02-13 11:45:24,504 ERROR [master/ve0524:16000] master.HMaster: Failed to become active master java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1052) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:916) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Timedout 30ms waiting for namespace table to be assigned and enabled: ENABLED at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107) at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1050) ... 4 more 2018-02-13 11:45:24,506 ERROR [master/ve0524:16000] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController] 2018-02-13 11:45:24,506 ERROR [master/ve0524:16000] master.HMaster: * ABORTING master ve0524.halxg.cloudera.com,16000,1518550812400: Unhandled exception. Starting shutdown. * java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1052) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:916) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2026) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:555) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Timedout 30ms waiting for namespace table to be assigned and enabled: ENABLED at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107) at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1050) ... 4 more {code} Last thing in log before Master crash was: 2018-02-13 11:34:17,084 INFO [master/ve0524:16000] hbase.MetaTableAccessor: Updated table hbase:namespace state to ENABLED in META There is no one doing an assign. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-19989: Attachment: (was: HBASE-19989.patch) > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363081#comment-16363081 ] Ben Lau commented on HBASE-19989: - Hi Ted, thanks for the feedback, I'm not sure a comment will be helpful since it comes down to 'if the break is here the code below doesn't run, so the break is not here' but I have added a comment anyway and re-added the ZKLess split/merge tests that were removed in branch-1. Let me know your thoughts, thanks. > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch, HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau updated HBASE-19989: Attachment: HBASE-19989.patch > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch, HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363052#comment-16363052 ] stack commented on HBASE-19988: --- Retry > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch, > hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19988: -- Attachment: hbase-19988.master.001.patch > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch, > hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363051#comment-16363051 ] Umesh Agashe commented on HBASE-19988: -- Thanks [~stack]! Lets wait for what [~appy] has to say on this. > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch, > hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363048#comment-16363048 ] stack commented on HBASE-19988: --- Thanks for explanation. +1 on patch then. > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19876) The exception happening in converting pb mutation to hbase.mutation messes up the CellScanner
[ https://issues.apache.org/jira/browse/HBASE-19876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363039#comment-16363039 ] Hudson commented on HBASE-19876: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4579 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4579/]) HBASE-19876 The exception happening in converting pb mutation to (chia7712: rev 2f48fdbb26ff555485b4aa3393d835b7dd8797a0) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMalformedCellFromClient.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java > The exception happening in converting pb mutation to hbase.mutation messes up > the CellScanner > - > > Key: HBASE-19876 > URL: https://issues.apache.org/jira/browse/HBASE-19876 > Project: HBase > Issue Type: Bug >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Critical > Fix For: 1.3.2, 1.5.0, 1.2.7, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19876.branch-1.2.v0.patch, > HBASE-19876.master.001.patch, HBASE-19876.v0.patch, HBASE-19876.v1.patch, > HBASE-19876.v2.patch, HBASE-19876.v3.patch, HBASE-19876.v3.patch, > HBASE-19876.v3.patch, HBASE-19876.v3.patch, HBASE-19876.v4.patch, > HBASE-19876.v5.patch, HBASE-19876.v6.patch > > > {code:java} > 2018-01-27 22:51:43,794 INFO [hconnection-0x3291b443-shared-pool11-t6] > client.AsyncRequestFutureImpl(778): id=5, table=testQuotaStatusFromMaster3, > attempt=6/16 failed=20ops, last > exception=org.apache.hadoop.hbase.client.WrongRowIOException: > org.apache.hadoop.hbase.client.WrongRowIOException: The row in xxx doesn't > match the original one aaa > at org.apache.hadoop.hbase.client.Mutation.add(Mutation.java:776) > at org.apache.hadoop.hbase.client.Put.add(Put.java:282) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toPut(ProtobufUtil.java:642) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:952) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:896) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2591) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304){code} > I noticed this bug when testing the table space quota. > When rs are converting pb mutation to hbase.mutation, the quota exception or > cell exception may be thrown. > {code} > Unable to find source-code formatter for language: > rsrpcservices#dobatchop.java. Available languages are: actionscript, ada, > applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, > java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, > rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml for > (ClientProtos.Action action: mutations) { > MutationProto m = action.getMutation(); > Mutation mutation; > if (m.getMutateType() == MutationType.PUT) { > mutation = ProtobufUtil.toPut(m, cells); > batchContainsPuts = true; > } else { > mutation = ProtobufUtil.toDelete(m, cells); > batchContainsDelete = true; > } > mutationActionMap.put(mutation, action); > mArray[i++] = mutation; > checkCellSizeLimit(region, mutation); > // Check if a space quota disallows this mutation > spaceQuotaEnforcement.getPolicyEnforcement(region).check(mutation); > quota.addMutation(mutation); > } > {code} > rs has caught the exception but it doesn't have the cellscanner skip the > failed cells. > {code:java} > } catch (IOException ie) { > if (atomic) { > throw ie; > } > for (Action mutation : mutations) { > builder.addResultOrException(getResultOrException(ie, > mutation.getIndex())); > } > } > {code} > The bug results in the WrongRowIOException to remaining mutations since they > refer to invalid cells. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19844) Shell should support flush by regionserver
[ https://issues.apache.org/jira/browse/HBASE-19844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363041#comment-16363041 ] Hudson commented on HBASE-19844: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4579 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4579/]) HBASE-19844 Shell should support to flush by regionserver (tedyu: rev 8e8e1e5a1bbb240a6f4a71bc8b0271d31da633b3) * (edit) hbase-shell/src/main/ruby/shell/commands/flush.rb * (edit) hbase-shell/src/test/ruby/hbase/admin_test.rb * (edit) hbase-shell/src/main/ruby/hbase/admin.rb > Shell should support flush by regionserver > -- > > Key: HBASE-19844 > URL: https://issues.apache.org/jira/browse/HBASE-19844 > Project: HBase > Issue Type: New Feature > Components: shell >Reporter: Chia-Ping Tsai >Assignee: Reid Chan >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19844.master.001.patch, > HBASE-19844.master.002.patch, HBASE-19844.master.003.patch, > HBASE-19844.master.004.patch > > > HBASE-4224 add a method to admin that can do the flush by regionserver. As > with other Admin methods, we should enable shell to use the flush method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19970) Remove unused functions from TableAuthManager
[ https://issues.apache.org/jira/browse/HBASE-19970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363040#comment-16363040 ] Hudson commented on HBASE-19970: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4579 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4579/]) Revert "HBASE-19970 Remove unused functions from TableAuthManager." (stack: rev ba402b1e7b446144d4d20f90cb71e6aa19ecce3c) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/TableAuthManager.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestZKPermissionWatcher.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestTablePermissions.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessControlLists.java > Remove unused functions from TableAuthManager > - > > Key: HBASE-19970 > URL: https://issues.apache.org/jira/browse/HBASE-19970 > Project: HBase > Issue Type: Task >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 1.5.0, 2.0.0-beta-2 > > Attachments: HBASE-19970.master.001.patch > > > Functions deleted in TableAuthManager: > - setTableUserPermissions > - setTableGroupPermissions > - setNamespaceUserPermissions > - setNamespaceGroupPermissions > - writeTableToZooKeeper > - writeNamespaceToZooKeeper > To make sure it was not a bug, and that relevant functionality moved to some > alternate code path, tried to find out why and when these functions went out > of use. But just couldn't figure out...until i reached the patch which added > them. Looks like they were dead functions to start with :) > Jira which added them: HBASE-8409. Commit id: > ac10b3c13d6b66e12d0c9601204b01dfa525ed19 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363042#comment-16363042 ] Hudson commented on HBASE-19979: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4579 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4579/]) HBASE-19979 ReplicationSyncUp tool may leak Zookeeper connection (stack: rev 39e191e5598529c68007c96e69acdd923a294d33) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSyncUp.java > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363010#comment-16363010 ] Hadoop QA commented on HBASE-19116: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 37s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 8s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s{color} | {color:red} hbase-server: The patch generated 4 new + 18 unchanged - 2 fixed = 22 total (was 20) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 10s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 14m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}103m 11s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}134m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db | | JIRA Issue | HBASE-19116 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910426/HBASE-19116.branch-2.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux be4b69301fc8 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2 / 4594f7156d | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11511/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11511/testReport/ | | Max. process+thread count | 4974 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11511/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This
[jira] [Commented] (HBASE-19988) HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while waiting for a row lock
[ https://issues.apache.org/jira/browse/HBASE-19988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362989#comment-16362989 ] Umesh Agashe commented on HBASE-19988: -- It was logging following exception... several times! {code:java} 2018-02-10 04:24:25,503 WARN [PutThread] regionserver.HRegion(5636): Thread interrupted waiting for lock on row: row0 2018-02-10 04:24:25,503 WARN [PutThread] regionserver.HRegion$BatchOperation(3173): Failed getting lock, row=row0 java.io.InterruptedIOException at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5637) at org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3168) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:3837) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3810) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3741) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3732) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3746) at org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4074) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:2925) at org.apache.hadoop.hbase.regionserver.TestHRegion$PutThread.run(TestHRegion.java:3891) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871) at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5621) ... 9 more{code} There is a loop in the write batch path: {code:java} while (!batchOp.isDone()) { doMiniBatchMutate(batchOp); }{code} This loop essentially, tries to acquire locks on as many rows in a batch as possible and creates a mini-batch of those rows to write. Next time, locks are acquired from last row (row for which previous iteration failed to acquire a lock) on till the entire batch is written. The operation was aborted/ stopped only on Timeout exception. All other exceptions were logged and ignored to resume creating and writing mini-batches for an input batch. In this particular case, getRowLockInternal() used to fail with exception InterruptedIOException caused by surefire (possibly due to test timeout). This exception was ignored to proceed with write operation containing locked rows so far. This was causing continuous calls to doMinibatchMutate() in a loop, filling up the logs. > HRegion#lockRowsAndBuildMiniBatch() is too chatty when interrupted while > waiting for a row lock > --- > > Key: HBASE-19988 > URL: https://issues.apache.org/jira/browse/HBASE-19988 > Project: HBase > Issue Type: Improvement > Components: amv2 >Affects Versions: 2.0.0-beta-1 >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0-beta-2 > > Attachments: hbase-19988.master.001.patch > > > See HBASE-19970, TestHRegionWithInMemoryFlush created 4.2g log file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-19767) Master web UI shows negative values for Remaining KVs
[ https://issues.apache.org/jira/browse/HBASE-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Umesh Agashe reassigned HBASE-19767: Assignee: Umesh Agashe > Master web UI shows negative values for Remaining KVs > - > > Key: HBASE-19767 > URL: https://issues.apache.org/jira/browse/HBASE-19767 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-4 >Reporter: Jean-Marc Spaggiari >Assignee: Umesh Agashe >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: Screen Shot 2018-01-12 at 12.18.41 PM.png > > > In the Master Web UI, under the compaction tab, the Remaining KVs sometimes > shows negative values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19767) Master web UI shows negative values for Remaining KVs
[ https://issues.apache.org/jira/browse/HBASE-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362932#comment-16362932 ] Umesh Agashe commented on HBASE-19767: -- [~stack], I will pick this up. > Master web UI shows negative values for Remaining KVs > - > > Key: HBASE-19767 > URL: https://issues.apache.org/jira/browse/HBASE-19767 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-4 >Reporter: Jean-Marc Spaggiari >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: Screen Shot 2018-01-12 at 12.18.41 PM.png > > > In the Master Web UI, under the compaction tab, the Remaining KVs sometimes > shows negative values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19166) Add translation for handling hbase.regionserver.wal.WALEdit
[ https://issues.apache.org/jira/browse/HBASE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362929#comment-16362929 ] stack commented on HBASE-19166: --- Regards the description, a WALPlayer from hbase1 trying to read a hbase2 WAL, just use an hbase2 WALPlayer to do the job. On an hbase1 splitting hbase2 logs and failing as per the above, that might be ok; it just means we need to add more RegionServers to the cluster of hbase2-type that can split the logs. Need to plan rolling upgrade. That'll tell us if we need this facility or not. Meantime moving out of beta-2. > Add translation for handling hbase.regionserver.wal.WALEdit > --- > > Key: HBASE-19166 > URL: https://issues.apache.org/jira/browse/HBASE-19166 > Project: HBase > Issue Type: Bug > Components: wal >Reporter: Ted Yu >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > > For hlog generated by 1.x, using WALPlayer from hbase2 would result in: > {code} > 2017-11-02 21:22:40,907 INFO [main] mapreduce.Job: Task Id : > attempt_1509641483571_0003_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: > org.apache.hadoop.hbase.regionserver.wal.WALEdit cannot be cast to > org.apache.hadoop.hbase.wal.WALEdit > at > org.apache.hadoop.hbase.mapreduce.WALPlayer$WALCellMapper.map(WALPlayer.java:143) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > {code} > HBASE-16479 relocated WALEdit. > Chatting with Enis, he mentioned adding translation for handling > hbase.regionserver.wal.WALEdit > This way, WAL from 1.x can be recognized by hbase-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19166) Add translation for handling hbase.regionserver.wal.WALEdit
[ https://issues.apache.org/jira/browse/HBASE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19166: -- Fix Version/s: (was: 2.0.0-beta-2) 2.0.0 > Add translation for handling hbase.regionserver.wal.WALEdit > --- > > Key: HBASE-19166 > URL: https://issues.apache.org/jira/browse/HBASE-19166 > Project: HBase > Issue Type: Bug > Components: wal >Reporter: Ted Yu >Assignee: stack >Priority: Blocker > Fix For: 2.0.0 > > > For hlog generated by 1.x, using WALPlayer from hbase2 would result in: > {code} > 2017-11-02 21:22:40,907 INFO [main] mapreduce.Job: Task Id : > attempt_1509641483571_0003_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: > org.apache.hadoop.hbase.regionserver.wal.WALEdit cannot be cast to > org.apache.hadoop.hbase.wal.WALEdit > at > org.apache.hadoop.hbase.mapreduce.WALPlayer$WALCellMapper.map(WALPlayer.java:143) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > {code} > HBASE-16479 relocated WALEdit. > Chatting with Enis, he mentioned adding translation for handling > hbase.regionserver.wal.WALEdit > This way, WAL from 1.x can be recognized by hbase-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19166) Add translation for handling hbase.regionserver.wal.WALEdit
[ https://issues.apache.org/jira/browse/HBASE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362897#comment-16362897 ] stack commented on HBASE-19166: --- hbase1 complaint is now: {code} 1134718 2018-02-13 10:43:57,589 DEBUG [RS_LOG_REPLAY_OPS-ve0530:16020-0] wal.WALSplitter: Finishing writing output logs and closing down. 1134719 2018-02-13 10:43:57,589 INFO [RS_LOG_REPLAY_OPS-ve0530:16020-0] wal.WALSplitter: Processed 0 edits across 0 regions; edits skipped=0; log file=hdfs://ve0524.halxg.cloudera.com:8020/hbase/WALs/ve0534.halxg.cloudera.com,16020,1518546984742-splitting/ve0534.halxg.cloudera.com%2C1 6020%2C1518546984742.meta.1518546993545.meta, length=23982, corrupted=false, progress failed=false 1134720 2018-02-13 10:43:57,590 WARN [RS_LOG_REPLAY_OPS-ve0530:16020-0] regionserver.SplitLogWorker: log splitting of WALs/ve0534.halxg.cloudera.com,16020,1518546984742-splitting/ve0534.halxg.cloudera.com%2C16020%2C1518546984742.meta.1518546993545.meta failed, returning error 1134721 java.io.IOException: Got unknown writer class: AsyncProtobufLogWriter 1134722 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initInternal(ProtobufLogReader.java:220) 1134723 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initReader(ProtobufLogReader.java:169) 1134724 at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:66) 1134725 at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:164) 1134726 at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303) 1134727 at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267) 1134728 at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:853) 1134729 at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:777) 1134730 at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:298) 1134731 at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:236) 1134732 at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104) 1134733 at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72) 1134734 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129) 1134735 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 1134736 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 1134737 at java.lang.Thread.run(Thread.java:748) {code} > Add translation for handling hbase.regionserver.wal.WALEdit > --- > > Key: HBASE-19166 > URL: https://issues.apache.org/jira/browse/HBASE-19166 > Project: HBase > Issue Type: Bug > Components: wal >Reporter: Ted Yu >Assignee: stack >Priority: Blocker > Fix For: 2.0.0-beta-2 > > > For hlog generated by 1.x, using WALPlayer from hbase2 would result in: > {code} > 2017-11-02 21:22:40,907 INFO [main] mapreduce.Job: Task Id : > attempt_1509641483571_0003_m_00_0, Status : FAILED > Error: java.lang.ClassCastException: > org.apache.hadoop.hbase.regionserver.wal.WALEdit cannot be cast to > org.apache.hadoop.hbase.wal.WALEdit > at > org.apache.hadoop.hbase.mapreduce.WALPlayer$WALCellMapper.map(WALPlayer.java:143) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > {code} > HBASE-16479 relocated WALEdit. > Chatting with Enis, he mentioned adding translation for handling > hbase.regionserver.wal.WALEdit > This way, WAL from 1.x can be recognized by hbase-2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362896#comment-16362896 ] Ted Yu commented on HBASE-19989: In the next patch, please add comment in the place of the previous break, explaining why the break is absent. > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362886#comment-16362886 ] Ted Yu commented on HBASE-19989: Thanks for the update. Happy New Year, Francis and Ben. > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362878#comment-16362878 ] Francis Liu commented on HBASE-19989: - [~yuzhih...@gmail.com] This is a bug in zkless assignment there used to be tests but they were removed. We'll include the zkless split tests in this patch. We've already been running the tests and this patch in prod. We'll work on adding back the rest of the zkless tests as part of HBASE-14626. > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Attachments: HBASE-19989.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19116: -- Status: Patch Available (was: Open) > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19116) Currently the tail of hfiles with CellComparator* classname makes it so hbase1 can't open hbase2 written hfiles; fix
[ https://issues.apache.org/jira/browse/HBASE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19116: -- Attachment: HBASE-19116.branch-2.001.patch > Currently the tail of hfiles with CellComparator* classname makes it so > hbase1 can't open hbase2 written hfiles; fix > > > Key: HBASE-19116 > URL: https://issues.apache.org/jira/browse/HBASE-19116 > Project: HBase > Issue Type: Sub-task > Components: HFile, migration >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19116.branch-2.001.patch > > > See tail of HBASE-19052 for discussion which concludes we should try and make > it so operators do not have to go to latest hbase version before they > upgrade, at least if we can avoid it. > The necessary change of our default comparator from KV to Cell naming has > hfiles with tails that have the classname CellComparator in them in place of > KeyValueComparator. If an hbase1 tries to open them, it will fail not having > a CellComparator in its classpath (We have name of comparator in tail because > different files require different comparators... perhaps we write an alias > instead of a class one day... TODO). HBASE-16189 and HBASE-19052 are about > trying to carry knowledge of hbase2 back to hbase1, a brittle approach making > it so operators will have to upgrade to the latest branch-1 before they can > go to hbase2. > This issue is about undoing our writing of an incompatible (to hbase1) tail, > not unless we really have to (and it sounds like we could do without writing > an incompatible tail) to see if we can avoid requiring operators go to > lastest branch-1 (we may end up needing this but lets a have a really good > reason for it if we do). > Oh, let this filing be an answer to our [~anoop.hbase]'s old high-level > question over in HBASE-16189: > bq. ...means when rolling upgrade done to 2.0, first users have to upgrade to > some 1.x versions which is having this fix and then to 2.0.. What do you guys > think Whether we should avoid this kind of indirection? cc Enis Soztutar, > Stack, Ted Yu, Matteo Bertozzi > Yeah, lets try to avoid this if we can... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19991) lots of hbase-rest test failures against hadoop 3
Mike Drob created HBASE-19991: - Summary: lots of hbase-rest test failures against hadoop 3 Key: HBASE-19991 URL: https://issues.apache.org/jira/browse/HBASE-19991 Project: HBase Issue Type: Bug Components: REST, test Reporter: Mike Drob Assignee: Mike Drob Fix For: 2.0.0 mvn clean test -pl hbase-rest -Dhadoop.profile=3.0 [ERROR] Tests run: 106, Failures: 95, Errors: 8, Skipped: 1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19930) fix ImmutableMemStoreLAB#forceCopyOfBigCellInto
[ https://issues.apache.org/jira/browse/HBASE-19930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362743#comment-16362743 ] Hadoop QA commented on HBASE-19930: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 2s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} hbase-server: The patch generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 1s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 20m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 36s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19930 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910401/HBASE-19930-V05.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 8ed3c1587fd0 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / ba402b1e7b | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/11510/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11510/testReport/ | | Max. process+thread count | 4974 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11510/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was
[jira] [Updated] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19979: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Nice one [~pankaj2461] Good find. Pushed to branch-2 and master. Looks like [~yuzhih...@gmail.com] pushed to branch-1.4 and branch-1 (Again, please use --author param so you can accredit the patch properly Ted Yu). > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19979) ReplicationSyncUp tool may leak Zookeeper connection
[ https://issues.apache.org/jira/browse/HBASE-19979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-19979: -- Fix Version/s: 1.4.2 > ReplicationSyncUp tool may leak Zookeeper connection > > > Key: HBASE-19979 > URL: https://issues.apache.org/jira/browse/HBASE-19979 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Major > Fix For: 1.5.0, 2.0.0-beta-2, 1.4.2 > > Attachments: HBASE-19979-branch-1.001.patch, HBASE-19979.patch > > > ReplicationSyncUp tool may leak Zookeeper connection in the following code > snippet, > {code} > try { > int numberOfOldSource = 1; // default wait once > while (numberOfOldSource > 0) { > Thread.sleep(SLEEP_TIME); > numberOfOldSource = manager.getOldSources().size(); > } > } catch (InterruptedException e) { > System.err.println("didn't wait long enough:" + e); > return (-1); > } > manager.join(); > zkw.close(); > {code} > ZooKeeperWatcher will not be closed in case of InterruptedException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19953) Avoid calling post* hook when procedure fails
[ https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362622#comment-16362622 ] Josh Elser commented on HBASE-19953: Let me take a look at that. Thanks for the pointer, sir. > Avoid calling post* hook when procedure fails > - > > Key: HBASE-19953 > URL: https://issues.apache.org/jira/browse/HBASE-19953 > Project: HBase > Issue Type: Bug > Components: master, proc-v2 >Reporter: Ramesh Mani >Assignee: Josh Elser >Priority: Critical > Fix For: 2.0.0-beta-2 > > > Ramesh pointed out a case where I think we're mishandling some post\* > MasterObserver hooks. Specifically, I'm looking at the deleteNamespace. > We synchronously execute the DeleteNamespace procedure. When the user > provides a namespace that isn't empty, the procedure does a rollback (which > is just a no-op), but this doesn't propagate an exception up to the > NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh > pointing it out a bit better to me that the code executes a bit differently > than we actually expect. > I think we need to double-check our post hooks and make sure we aren't > invoking them when the procedure actually failed. cc/ [~Apache9], [~stack]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19953) Avoid calling post* hook when procedure fails
[ https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362607#comment-16362607 ] stack commented on HBASE-19953: --- Looking at related Master-side Operations, I see them take a latch in the NonceProcedureRunnable implementation. When latch is thrown, they call the post op. See enableTable, createTable, etc. This delete namespace should do similar? Later we should come back and get rid of all these latches (and then we'll have to figure how Observer can monitor Procedure). > Avoid calling post* hook when procedure fails > - > > Key: HBASE-19953 > URL: https://issues.apache.org/jira/browse/HBASE-19953 > Project: HBase > Issue Type: Bug > Components: master, proc-v2 >Reporter: Ramesh Mani >Assignee: Josh Elser >Priority: Critical > Fix For: 2.0.0-beta-2 > > > Ramesh pointed out a case where I think we're mishandling some post\* > MasterObserver hooks. Specifically, I'm looking at the deleteNamespace. > We synchronously execute the DeleteNamespace procedure. When the user > provides a namespace that isn't empty, the procedure does a rollback (which > is just a no-op), but this doesn't propagate an exception up to the > NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh > pointing it out a bit better to me that the code executes a bit differently > than we actually expect. > I think we need to double-check our post hooks and make sure we aren't > invoking them when the procedure actually failed. cc/ [~Apache9], [~stack]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)