[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort
[ https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611580#comment-14611580 ] Hudson commented on HBASE-13895: FAILURE: Integrated in HBase-TRUNK #6624 (See [https://builds.apache.org/job/HBase-TRUNK/6624/]) HBASE-13895 DATALOSS: Region assigned before WAL replay when abort (Enis Soztutar) -- REAPPLY (stack: rev 20e855f2824d3d39c13560fedabbd985f3ae5d13) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java * hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerStoppedException.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestWALPlayer.java DATALOSS: Region assigned before WAL replay when abort -- Key: HBASE-13895 URL: https://issues.apache.org/jira/browse/HBASE-13895 Project: HBase Issue Type: Bug Affects Versions: 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: 13895.branch-1.2.txt, 13895.master.patch, hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch Opening a place holder till finish analysis. I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious culprit is the double-assignment that I can see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13500) Deprecate KVComparator and move to CellComparator
[ https://issues.apache.org/jira/browse/HBASE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611589#comment-14611589 ] Anoop Sam John commented on HBASE-13500: [~ram_krish] we can close this main jira now. Deprecate KVComparator and move to CellComparator - Key: HBASE-13500 URL: https://issues.apache.org/jira/browse/HBASE-13500 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611530#comment-14611530 ] Hadoop QA commented on HBASE-13832: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743215/hbase-13832-v3.patch against master branch at commit f0e29c49a1f5f3773ba03b822805d863c149b443. ATTACHMENT ID: 12743215 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1898 checkstyle errors (more than the master's current 1897 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn post-site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestProcessBasedCluster org.apache.hadoop.hbase.mapreduce.TestImportExport org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestTableSnapshotInputFormat.testInitTableSnapshotMapperJobConfig(TestTableSnapshotInputFormat.java:146) at org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScanBase.testScan(TestTableInputFormatScanBase.java:244) at org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan2.testScanOBBToQPP(TestTableInputFormatScan2.java:57) at org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:133) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingRowColBloom(TestLoadIncrementalHFiles.java:142) at org.apache.hadoop.hbase.mapreduce.TestImportExport.testExportScannerBatching(TestImportExport.java:271) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14648//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14648//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14648//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14648//console This message is automatically generated. Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort
[ https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611546#comment-14611546 ] stack commented on HBASE-13895: --- Ok. Added missing patch and the addendum that fixes failing TestAssignmentManagerOnCluster tests. Agree with fix for UT (I love unit tests). For branch-1+ I applied addendum and checked I got all patch this time. On branch-2, I applied the original patch plus version of master addendum. I made master same as branch-1s. The master addendum makes logic different. Why [~enis]? I'll addendum the master is intended. I am talking about this hunk in master addendum patch: {code} 14 @@ -891,12 +891,16 @@ public class AssignmentManager { 15 LOG.warn(Server + server + region CLOSE RPC returned false for + 16region.getRegionNameAsString()); 17} catch (Throwable t) { 18 +long sleepTime = 0; 19 +Configuration conf = this.server.getConfiguration(); 20 if (t instanceof RemoteException) { 21t = ((RemoteException)t).unwrapRemoteException(); 22 } 23 -if (t instanceof NotServingRegionException 24 +if (t instanceof RegionServerAbortedException 25 || t instanceof RegionServerStoppedException 26 || t instanceof ServerNotRunningYetException) { 27 + 28 +} else if (t instanceof NotServingRegionException) { 29LOG.debug(Offline + region.getRegionNameAsString() 30 + , it's not any more on + server, t); 31regionStates.updateRegionState(region, State.OFFLINE); {code} whereas in original patch we have this (set a sleeptime...) {code} 411 @@ -1866,11 +1867,19 @@ public class AssignmentManager extends ZooKeeperListener { 412 LOG.warn(Server + server + region CLOSE RPC returned false for + 413region.getRegionNameAsString()); 414} catch (Throwable t) { 415 +long sleepTime = 0; 416 +Configuration conf = this.server.getConfiguration(); 417 if (t instanceof RemoteException) { 418t = ((RemoteException)t).unwrapRemoteException(); 419 } 420 boolean logRetries = true; 421 -if (t instanceof NotServingRegionException 422 +if (t instanceof RegionServerAbortedException) { 423 + // RS is aborting, we cannot offline the region since the region may need to do WAL 424 + // recovery. Until we see the RS expiration, we should retry. 425 + sleepTime = 1 + conf.getInt(RpcClient.FAILED_SERVER_EXPIRY_KEY, 426 +RpcClient.FAILED_SERVER_EXPIRY_DEFAULT); 427 + 428 +} else if (t instanceof NotServingRegionException 429 || t instanceof RegionServerStoppedException 430 || t instanceof ServerNotRunningYetException) { {code} Thanks for catching my misapply. DATALOSS: Region assigned before WAL replay when abort -- Key: HBASE-13895 URL: https://issues.apache.org/jira/browse/HBASE-13895 Project: HBase Issue Type: Bug Affects Versions: 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: 13895.master.patch, hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch Opening a place holder till finish analysis. I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious culprit is the double-assignment that I can see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort
[ https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13895: -- Attachment: 13895.branch-1.2.txt branch-1.2 needed same change in TestAssignmentManager as branch-1. Applied this. Hopefully that is it. DATALOSS: Region assigned before WAL replay when abort -- Key: HBASE-13895 URL: https://issues.apache.org/jira/browse/HBASE-13895 Project: HBase Issue Type: Bug Affects Versions: 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: 13895.branch-1.2.txt, 13895.master.patch, hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch Opening a place holder till finish analysis. I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious culprit is the double-assignment that I can see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort
[ https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611592#comment-14611592 ] Hudson commented on HBASE-13895: SUCCESS: Integrated in HBase-1.1 #571 (See [https://builds.apache.org/job/HBase-1.1/571/]) HBASE-13895 DATALOSS: Region assigned before WAL replay when abort (Enis Soztutar) -- ADDENDUM (stack: rev a9cecf32a99caed3fccd0b6b00aca6d42d7979d3) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java DATALOSS: Region assigned before WAL replay when abort -- Key: HBASE-13895 URL: https://issues.apache.org/jira/browse/HBASE-13895 Project: HBase Issue Type: Bug Affects Versions: 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: 13895.branch-1.2.txt, 13895.master.patch, hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch Opening a place holder till finish analysis. I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious culprit is the double-assignment that I can see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611559#comment-14611559 ] stack commented on HBASE-13977: --- What does this mean: Gets the current key in the form of a cell. That there is no value returned? In getKeyAsCell, why bother with a ByteBuffer when all you are doing is passing an array? getKeyAsCell is defined in multiple Interfaces? Can we avoid that? Here we are creating a Cell every time: if (getComparator().compare(splitCell, getKeyAsCell()) = 0) { Previous we were passing current array, no allocation and no KeyValue creation (if I am reading this right). Do we have to do this? Anything we can do better here? Ditto in next hunk of changes. Otherwise, I like these changes. Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort
[ https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611541#comment-14611541 ] Hudson commented on HBASE-13895: FAILURE: Integrated in HBase-TRUNK #6623 (See [https://builds.apache.org/job/HBase-TRUNK/6623/]) HBASE-13895 DATALOSS: Region assigned before WAL replay when abort (Enis (stack: rev fca725a899984f57a6ad48ce9ae9cbc34e8ce752) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestWALPlayer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java * hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerStoppedException.java Revert HBASE-13895 DATALOSS: Region assigned before WAL replay when abort (Enis (stack: rev f0e29c49a1f5f3773ba03b822805d863c149b443) * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestWALPlayer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/WALPlayer.java * hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerStoppedException.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java * hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAbortedException.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java DATALOSS: Region assigned before WAL replay when abort -- Key: HBASE-13895 URL: https://issues.apache.org/jira/browse/HBASE-13895 Project: HBase Issue Type: Bug Affects Versions: 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: 13895.master.patch, hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch Opening a place holder till finish analysis. I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious culprit is the double-assignment that I can see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced
[ https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14010: -- Attachment: 14010.txt Passed twice. Try a third time. TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced - Key: HBASE-14010 URL: https://issues.apache.org/jira/browse/HBASE-14010 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Attachments: 14010.txt, 14010.txt, 14010.txt java.lang.AssertionError: null at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144) from recent build https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13998) Remove CellComparator#compareRows(byte[] left, int loffset, int llength, byte[] right, int roffset, int rlength)
[ https://issues.apache.org/jira/browse/HBASE-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13998: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Added the code comments in MetaCache. Thanks for the reviews Ram and Stack. Remove CellComparator#compareRows(byte[] left, int loffset, int llength, byte[] right, int roffset, int rlength) Key: HBASE-13998 URL: https://issues.apache.org/jira/browse/HBASE-13998 Project: HBase Issue Type: Sub-task Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13998.patch A public API in CellComparator which takes old style byte[], offset, length alone is not correct. CellComparator supposed to compare cell(s). At least one side param has to be a cell.. This is the agreement we discussed in HBASE-10800. Still we could not remove the above one method because it was getting used from multiple places. Now most of the usage is removed. This jira aims at removing it fully and replace the usage with other APIs. Note: The CellComparator is added in 2.0 only so removing the public API is not creating any BC issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13998) Remove CellComparator#compareRows(byte[] left, int loffset, int llength, byte[] right, int roffset, int rlength)
[ https://issues.apache.org/jira/browse/HBASE-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611581#comment-14611581 ] Hudson commented on HBASE-13998: FAILURE: Integrated in HBase-TRUNK #6624 (See [https://builds.apache.org/job/HBase-TRUNK/6624/]) HBASE-13998 Remove CellComparator#compareRows(byte[] left, int loffset, int llength, byte[] right, int roffset, int rlength). (anoopsamjohn: rev 62f56944919b436036dcac740d8a21c56289a164) * hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetaCache.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/SyncTable.java * hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestClientNoCluster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StripeStoreFileManager.java * hbase-common/src/main/java/org/apache/hadoop/hbase/CellComparator.java Remove CellComparator#compareRows(byte[] left, int loffset, int llength, byte[] right, int roffset, int rlength) Key: HBASE-13998 URL: https://issues.apache.org/jira/browse/HBASE-13998 Project: HBase Issue Type: Sub-task Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: HBASE-13998.patch A public API in CellComparator which takes old style byte[], offset, length alone is not correct. CellComparator supposed to compare cell(s). At least one side param has to be a cell.. This is the agreement we discussed in HBASE-10800. Still we could not remove the above one method because it was getting used from multiple places. Now most of the usage is removed. This jira aims at removing it fully and replace the usage with other APIs. Note: The CellComparator is added in 2.0 only so removing the public API is not creating any BC issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced
[ https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611608#comment-14611608 ] Hadoop QA commented on HBASE-14010: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743230/14010.txt against master branch at commit f0e29c49a1f5f3773ba03b822805d863c149b443. ATTACHMENT ID: 12743230 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14649//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14649//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14649//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14649//console This message is automatically generated. TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced - Key: HBASE-14010 URL: https://issues.apache.org/jira/browse/HBASE-14010 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Attachments: 14010.txt, 14010.txt java.lang.AssertionError: null at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144) from recent build https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13895) DATALOSS: Region assigned before WAL replay when abort
[ https://issues.apache.org/jira/browse/HBASE-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13895: -- Attachment: 13895.master.addendum2.txt A fix for master compile failure. So, on master, we did revert, then applied new patch... and then added his addendum. DATALOSS: Region assigned before WAL replay when abort -- Key: HBASE-13895 URL: https://issues.apache.org/jira/browse/HBASE-13895 Project: HBase Issue Type: Bug Affects Versions: 1.2.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: 13895.branch-1.2.txt, 13895.master.addendum2.txt, 13895.master.patch, hbase-13895_addendum-master.patch, hbase-13895_addendum.patch, hbase-13895_v1-branch-1.1.patch Opening a place holder till finish analysis. I have dataloss running ITBLL at 3B (testing HBASE-13877). Most obvious culprit is the double-assignment that I can see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14008) REST - Throw an appropriate error during schema POST
[ https://issues.apache.org/jira/browse/HBASE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611630#comment-14611630 ] Hadoop QA commented on HBASE-14008: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743241/14008.patch against master branch at commit f0e29c49a1f5f3773ba03b822805d863c149b443. ATTACHMENT ID: 12743241 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 2 zombie test(s): at org.apache.hadoop.hbase.filter.TestFilterWithScanLimits.testScanWithLimit(TestFilterWithScanLimits.java:71) at org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd.testEndToEnd(TestFuzzyRowFilterEndToEnd.java:140) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14650//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14650//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14650//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14650//console This message is automatically generated. REST - Throw an appropriate error during schema POST Key: HBASE-14008 URL: https://issues.apache.org/jira/browse/HBASE-14008 Project: HBase Issue Type: Bug Components: REST Affects Versions: 0.98.13, 1.1.1 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Priority: Minor Labels: REST Fix For: 1.2.0, 1.1.2 Attachments: 14008.patch, HBASE-14008.patch When an update is done on the schema through REST and an error occurs, the actual reason is not thrown back to the client. Right now we get a javax.ws.rs.WebApplicationException instead of the actual error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611718#comment-14611718 ] Francesco MDE commented on HBASE-14005: --- Apparently this patch was already present in HBASE-8495 but one LOC got lost Set permission to .top hfile in LoadIncrementalHFiles - Key: HBASE-14005 URL: https://issues.apache.org/jira/browse/HBASE-14005 Project: HBase Issue Type: Bug Reporter: Francesco MDE Priority: Trivial Attachments: HBASE-14005.patch Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611805#comment-14611805 ] ramkrishna.s.vasudevan commented on HBASE-13970: I have lost my server for some time. I think you can commit it. It may take some more time for me to test. So +1. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13970-v1.patch, HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Attachment: HBASE-13977_4.patch Updated patch as per Stack's comments. Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Status: Patch Available (was: Open) Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Status: Open (was: Patch Available) Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13986) HMaster instance always returns false for isAborted() check.
[ https://issues.apache.org/jira/browse/HBASE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar updated HBASE-13986: --- Attachment: HBASE-13986.patch attaching patch for review that sets the 'abortRequested' to true for master instance, so that all master flows using this flag (or isAborted method) gets abort flag value set. HMaster instance always returns false for isAborted() check. Key: HBASE-13986 URL: https://issues.apache.org/jira/browse/HBASE-13986 Project: HBase Issue Type: Bug Reporter: Abhishek Kumar Assignee: Abhishek Kumar Priority: Minor Attachments: HBASE-13986.patch It seems that HMaster never set abortRequested flag to true as done by HRegionServer in its abort() method.We can see isAborted method being used in few places for HMaster instance (like in HMasterCommandLine.startMaster) where code flow being determined based on the result of isAborted() call. We can set this abortRequested flag in Hmaster's abort() method as well like in HRegionServer's abort method, let me know if it seems ok. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13986) HMaster instance always returns false for isAborted() check.
[ https://issues.apache.org/jira/browse/HBASE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar updated HBASE-13986: --- Status: Patch Available (was: Open) HMaster instance always returns false for isAborted() check. Key: HBASE-13986 URL: https://issues.apache.org/jira/browse/HBASE-13986 Project: HBase Issue Type: Bug Reporter: Abhishek Kumar Assignee: Abhishek Kumar Priority: Minor Attachments: HBASE-13986.patch It seems that HMaster never set abortRequested flag to true as done by HRegionServer in its abort() method.We can see isAborted method being used in few places for HMaster instance (like in HMasterCommandLine.startMaster) where code flow being determined based on the result of isAborted() call. We can set this abortRequested flag in Hmaster's abort() method as well like in HRegionServer's abort method, let me know if it seems ok. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco MDE updated HBASE-14005: -- Attachment: HBASE-14005.patch Set permission to .top hfile in LoadIncrementalHFiles - Key: HBASE-14005 URL: https://issues.apache.org/jira/browse/HBASE-14005 Project: HBase Issue Type: Bug Reporter: Francesco MDE Priority: Trivial Attachments: HBASE-14005.patch Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Issue Type: Sub-task (was: Improvement) Parent: HBASE-13500 Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13500) Deprecate KVComparator and move to CellComparator
[ https://issues.apache.org/jira/browse/HBASE-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611796#comment-14611796 ] ramkrishna.s.vasudevan commented on HBASE-13500: Will close this after HBASE-13977 is done. Deprecate KVComparator and move to CellComparator - Key: HBASE-13500 URL: https://issues.apache.org/jira/browse/HBASE-13500 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611801#comment-14611801 ] Anoop Sam John commented on HBASE-13970: [~Apache9] You have 2 +1s on this Jira.. It is ready for commit. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13970-v1.patch, HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort {code} Will check this on what is the reason behind it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611828#comment-14611828 ] ramkrishna.s.vasudevan commented on HBASE-13977: Oh yes. I thought you were asking if we really need to do a copy. Sorry about that. Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco MDE updated HBASE-14005: -- Attachment: (was: HBASE-14005.patch) Set permission to .top hfile in LoadIncrementalHFiles - Key: HBASE-14005 URL: https://issues.apache.org/jira/browse/HBASE-14005 Project: HBase Issue Type: Bug Reporter: Francesco MDE Priority: Trivial Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611800#comment-14611800 ] ramkrishna.s.vasudevan commented on HBASE-13977: I went thro the changes once again bq.Gets the current key in the form of a cell. That there is no value returned? Yes. only the key part. Earlier it was a BB view of the key - now it is a Cell. bq.In getKeyAsCell, why bother with a ByteBuffer when all you are doing is passing an array? Ya, you are right. I changed that to just use the System.arrayCopy in BufferedDataEncoder.getKeyAsCell. bq.getKeyAsCell is defined in multiple Interfaces? Can we avoid that? For the DBE cases I think we cannot do it now because the entire seeker is now the BuffereddataEncoder. So we need some API in the DatablockEncoder to be used. May be another JIRA if it is possible? bq.if (getComparator().compare(splitCell, getKeyAsCell()) = 0) { Valid point. But previously for creating a BB we were creating a BB object but now we are creating a cell every time. But thinking in terms of BufferedBackedCell it would be better if it had been a cell. The current code is trying to do {code} - ByteBuffer bb = getKey(); - if (getComparator().compare(splitCell, bb.array(), bb.arrayOffset(), - bb.limit()) = 0) { {code} After BufferedBackedcells come - we cannot have it the above way as array() and arrayOffset() are not expected to be used. Hence making it as cell would be encapsulate us of this inner detail. I thought of using the instance level keyOnlyKv in the HFileScannerImpl - but since the HalfStorefileReader is trying to cache the firstKey we cannot use that instance level variable in HFileScanner and use that to just set the byte[] every time. Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611806#comment-14611806 ] Anoop Sam John commented on HBASE-13977: bq.In getKeyAsCell, why bother with a ByteBuffer when all you are doing is passing an array? Same thing I also asked in comments Ram. bq.No need to go with BB create now.. Directly make the Cell out of current.keyBuffer? Do we need to clone that (if so also byte[] copy and create KV)? Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced
[ https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611745#comment-14611745 ] Hadoop QA commented on HBASE-14010: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743251/14010.txt against master branch at commit 272b025b25fed979da0e59ffd41615bbb9e105ea. ATTACHMENT ID: 12743251 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14651//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14651//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14651//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14651//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14651//console This message is automatically generated. TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced - Key: HBASE-14010 URL: https://issues.apache.org/jira/browse/HBASE-14010 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Attachments: 14010.txt, 14010.txt, 14010.txt java.lang.AssertionError: null at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144) from recent build https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12224) Facilitate using ByteBuffer backed Cells in the HFileReader
[ https://issues.apache.org/jira/browse/HBASE-12224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12224: --- Summary: Facilitate using ByteBuffer backed Cells in the HFileReader (was: Facilitate using DBBs in the HFileReaders V2 and V3.) Facilitate using ByteBuffer backed Cells in the HFileReader --- Key: HBASE-12224 URL: https://issues.apache.org/jira/browse/HBASE-12224 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611963#comment-14611963 ] stack commented on HBASE-13977: --- Ok. +1 if hadoopqa passes and good by [~anoop.hbase] Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14011) MultiByteBuffer position based reads does not work correctly
[ https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611866#comment-14611866 ] Anoop Sam John commented on HBASE-14011: +1 MultiByteBuffer position based reads does not work correctly Key: HBASE-14011 URL: https://issues.apache.org/jira/browse/HBASE-14011 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-14011.patch The positional based reads in MBBs are having some issues when we try to read the first element from the 2nd BB when the MBB is formed with multiple BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14011) MultiByteBuffer position based reads does not work correctly
[ https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611949#comment-14611949 ] stack commented on HBASE-14011: --- +1 MultiByteBuffer position based reads does not work correctly Key: HBASE-14011 URL: https://issues.apache.org/jira/browse/HBASE-14011 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-14011.patch The positional based reads in MBBs are having some issues when we try to read the first element from the 2nd BB when the MBB is formed with multiple BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14011) MultiByteBuffer position based reads does not work correctly
ramkrishna.s.vasudevan created HBASE-14011: -- Summary: MultiByteBuffer position based reads does not work correctly Key: HBASE-14011 URL: https://issues.apache.org/jira/browse/HBASE-14011 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 The positional based reads in MBBs are having some issues when we try to read the first element from the 2 MBB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly
[ https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-14011: --- Status: Patch Available (was: Open) MultiByteBuffer position based reads does not work correctly Key: HBASE-14011 URL: https://issues.apache.org/jira/browse/HBASE-14011 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-14011.patch The positional based reads in MBBs are having some issues when we try to read the first element from the 2nd BB when the MBB is formed with multiple BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611862#comment-14611862 ] Anoop Sam John commented on HBASE-12213: Gone through the patch quickly once. Some immediate comments ByteBufferUtils - already added getLong/getInt etc methods.. Named them as toInt/toLong to be consistent with Bytes.java.. So avoid.. Also looks like some methods moved from one place to another.. All these unwanted changes pls avoid. UnsafeAccess - Here also methods are there to read int/long etc and making use of that in BBUtils. Also in compare(BB,BB) also making use of this Unsafe based way.. Pls avoid the compare kind of logic from this class. Will do more closer look at other area.. Thanks HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly
[ https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-14011: --- Attachment: HBASE-14011.patch Patch with UT. MultiByteBuffer position based reads does not work correctly Key: HBASE-14011 URL: https://issues.apache.org/jira/browse/HBASE-14011 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-14011.patch The positional based reads in MBBs are having some issues when we try to read the first element from the 2 MBB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly
[ https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-14011: --- Description: The positional based reads in MBBs are having some issues when we try to read the first element from the 2nd BB when the MBB is formed with multiple BBs. (was: The positional based reads in MBBs are having some issues when we try to read the first element from the 2 MBB.) MultiByteBuffer position based reads does not work correctly Key: HBASE-14011 URL: https://issues.apache.org/jira/browse/HBASE-14011 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-14011.patch The positional based reads in MBBs are having some issues when we try to read the first element from the 2nd BB when the MBB is formed with multiple BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13970) NPE during compaction in trunk
[ https://issues.apache.org/jira/browse/HBASE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-13970: -- Resolution: Fixed Assignee: Duo Zhang (was: ramkrishna.s.vasudevan) Hadoop Flags: Reviewed Fix Version/s: (was: 1.0.2) Status: Resolved (was: Patch Available) Pushed to all branches except branch-1.0(HBASE-8329 has not applied to branch-1.0). Thanks [~anoopsamjohn] and [~ram_krish]. NPE during compaction in trunk -- Key: HBASE-13970 URL: https://issues.apache.org/jira/browse/HBASE-13970 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 0.98.13, 1.2.0, 1.1.1 Reporter: ramkrishna.s.vasudevan Assignee: Duo Zhang Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13970-v1.patch, HBASE-13970.patch Updated the trunk.. Loaded the table with PE tool. Trigger a flush to ensure all data is flushed out to disk. When the first compaction is triggered we get an NPE and this is very easy to reproduce {code} 015-06-25 21:33:46,041 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,051 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Flushing 1/1 column families, memstore=76.91 MB 2015-06-25 21:33:46,159 ERROR [regionserver/stobdtserver3/10.224.54.70:16040-longCompactions-1435248183945] regionserver.CompactSplitThread: Compaction failed Request = regionName=TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4., storeName=info, fileCount=3, fileSize=343.4 M (114.5 M, 114.5 M, 114.5 M), priority=3, time=7536968291719985 java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController$ActiveCompaction.access$700(PressureAwareCompactionThroughputController.java:79) at org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController.finish(PressureAwareCompactionThroughputController.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:306) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:106) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:112) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1202) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1792) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:524) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-06-25 21:33:46,745 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=1534, memsize=76.9 M, hasBloomFilter=true, into tmp file hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/.tmp/942ba0831a0047a08987439e34361a0c 2015-06-25 21:33:46,772 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HStore: Added hdfs://stobdtserver3:9010/hbase/data/default/TestTable/028fb0324cd6eb03d5022eb8c147b7c4/info/942ba0831a0047a08987439e34361a0c, entries=68116, sequenceid=1534, filesize=68.7 M 2015-06-25 21:33:46,773 INFO [rs(stobdtserver3,16040,1435248182301)-flush-proc-pool3-thread-1] regionserver.HRegion: Finished memstore flush of ~76.91 MB/80649344, currentsize=0 B/0 for region TestTable,283887,1435248198798.028fb0324cd6eb03d5022eb8c147b7c4. in 723ms, sequenceid=1534, compaction requested=true 2015-06-25 21:33:46,780 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/reached/TestTable 2015-06-25 21:33:46,790 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received created event:/hbase/flush-table-proc/abort/TestTable 2015-06-25 21:33:46,791 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event: /hbase/flush-table-proc/abort 2015-06-25 21:33:46,803 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure start children changed event: /hbase/flush-table-proc/acquired 2015-06-25 21:33:46,818 INFO [main-EventThread] procedure.ZKProcedureMemberRpcs: Received procedure abort children changed event:
[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12213: --- Attachment: HBASE-12213_1.patch Trying for QA. I got a clean run locally. Let me see what QA bot says. HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14010) TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced
[ https://issues.apache.org/jira/browse/HBASE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611980#comment-14611980 ] stack commented on HBASE-14010: --- Can I have a +1 here? It seems to help with the TestRegionBalancing failures. TestRegionRebalancing.testRebalanceOnRegionServerNumberChange fails; cluster not balanced - Key: HBASE-14010 URL: https://issues.apache.org/jira/browse/HBASE-14010 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Attachments: 14010.txt, 14010.txt, 14010.txt java.lang.AssertionError: null at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:144) from recent build https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/14639/testReport/junit/org.apache.hadoop.hbase/TestRegionRebalancing/testRebalanceOnRegionServerNumberChange_0_/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers
[ https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12213: --- Status: Patch Available (was: Open) Uses MBB in the read path. - Since HBASE-12295 is not there we cannot allow the Bucket Cache to use the actual MBB based Blocks without doing copy. For now we have copied the Bucketcache and created an MBB out of it. - the HFileReader uses MBB and the relative reads have been replaced with absolute position based reads. The absolute position based reads uses the UnsafeAccess APIs. So better to make use of them. Tried to do a small micro benchmark mimicing the logic in blockSeek with and without positional reads using MBB. - There are some TODOs that will get changed after BufferBackedCells come in to place. - PrefixTree, blooms need to be handled to work with MBB that are offheap. That can be done later. HFileBlock backed by Array of ByteBuffers - Key: HBASE-12213 URL: https://issues.apache.org/jira/browse/HBASE-12213 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-12213_1.patch In L2 cache (offheap) an HFile block might have been cached into multiple chunks of buffers. If HFileBlock need single BB, we will end up in recreation of bigger BB and copying. Instead we can make HFileBlock to serve data from an array of BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco MDE updated HBASE-14005: -- Status: Open (was: Patch Available) Set permission to .top hfile in LoadIncrementalHFiles - Key: HBASE-14005 URL: https://issues.apache.org/jira/browse/HBASE-14005 Project: HBase Issue Type: Bug Reporter: Francesco MDE Priority: Trivial Attachments: HBASE-14005.patch Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14005: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch, Francesco Set permission to .top hfile in LoadIncrementalHFiles - Key: HBASE-14005 URL: https://issues.apache.org/jira/browse/HBASE-14005 Project: HBase Issue Type: Bug Reporter: Francesco MDE Assignee: Francesco MDE Priority: Trivial Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-14005.patch Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14005: --- Assignee: Francesco MDE Hadoop Flags: Reviewed Fix Version/s: 1.3.0 1.1.2 1.2.0 1.0.2 0.98.14 2.0.0 Set permission to .top hfile in LoadIncrementalHFiles - Key: HBASE-14005 URL: https://issues.apache.org/jira/browse/HBASE-14005 Project: HBase Issue Type: Bug Reporter: Francesco MDE Assignee: Francesco MDE Priority: Trivial Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-14005.patch Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14005) Set permission to .top hfile in LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-14005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francesco MDE updated HBASE-14005: -- Status: Patch Available (was: Open) Set permission to .top hfile in LoadIncrementalHFiles - Key: HBASE-14005 URL: https://issues.apache.org/jira/browse/HBASE-14005 Project: HBase Issue Type: Bug Reporter: Francesco MDE Priority: Trivial Attachments: HBASE-14005.patch Set the same -rwxrwxrwx permission to .top file as .bottom and _tmp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612488#comment-14612488 ] Enis Soztutar commented on HBASE-13832: --- bq. to get the same behavior you need to force running to false when you set syncException. so you prevent other procedure to be added. Not sure whether we gain by ensuring that running is set to false before the next execution for syncLoop. Wal store will abort when the master calls abort. Before this happens, concurrent calls to {{pushData()}} will still get the exception because the exception from sync is not cleared at all. So the semantics is that if {{snyc()}} + wal roll fails, we effectively start rejecting all requests for {{pushData()}}, which is kind of similar to making sure to check isRunning(). Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14008) REST - Throw an appropriate error during schema POST
[ https://issues.apache.org/jira/browse/HBASE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-14008: Fix Version/s: (was: 1.1.2) (was: 1.2.0) 1.3.0 Status: Open (was: Patch Available) Changing the error that comes back will break operational compatibility, so I'm re-targeting this to avoid patch releases in 1.y. please add a test that you get the expected error. REST - Throw an appropriate error during schema POST Key: HBASE-14008 URL: https://issues.apache.org/jira/browse/HBASE-14008 Project: HBase Issue Type: Bug Components: REST Affects Versions: 1.1.1, 0.98.13 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Priority: Minor Labels: REST Fix For: 1.3.0 Attachments: 14008.patch, HBASE-14008.patch When an update is done on the schema through REST and an error occurs, the actual reason is not thrown back to the client. Right now we get a javax.ws.rs.WebApplicationException instead of the actual error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13667) Backport HBASE-12975 to 1.0 and 0.98 without changing coprocessors hooks
[ https://issues.apache.org/jira/browse/HBASE-13667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13667: -- Fix Version/s: (was: 1.0.2) 1.0.3 Backport HBASE-12975 to 1.0 and 0.98 without changing coprocessors hooks Key: HBASE-13667 URL: https://issues.apache.org/jira/browse/HBASE-13667 Project: HBase Issue Type: Bug Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Fix For: 0.98.14, 1.0.3 We can backport Split transaction, region merge transaction interfaces to branch 1.0 and 0.98 without changing coprocessor hooks. Then it should be compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13857) Slow WAL Append count in ServerMetricsTmpl.jamon is hardcoded to zero
[ https://issues.apache.org/jira/browse/HBASE-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13857: -- Fix Version/s: (was: 1.0.2) 1.0.3 Slow WAL Append count in ServerMetricsTmpl.jamon is hardcoded to zero - Key: HBASE-13857 URL: https://issues.apache.org/jira/browse/HBASE-13857 Project: HBase Issue Type: Bug Components: regionserver, UI Affects Versions: 0.98.0 Reporter: Lars George Labels: beginner Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 The template has this: {noformat} tr ... thSlow WAL Append Count/th /tr tr td% 0 %/td /tr {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14015) Allow setting a richer state value when toString a pv2
[ https://issues.apache.org/jira/browse/HBASE-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14015: -- Fix Version/s: 1.3.0 1.2.0 2.0.0 Status: Patch Available (was: Open) Allow setting a richer state value when toString a pv2 -- Key: HBASE-14015 URL: https://issues.apache.org/jira/browse/HBASE-14015 Project: HBase Issue Type: Improvement Components: proc-v2 Reporter: stack Assignee: stack Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 0001-HBASE-14015-Allow-setting-a-richer-state-value-when-.patch Debugging, my procedure after a crash was loaded out of the store and its state was RUNNING. It would help if I knew in which of the states of a StateMachineProcedure it was going to start RUNNING at. Chatting w/ Matteo, he suggested allowing Procedures customize the String. Here is patch that makes it so StateMachineProcedure will now print out the base state -- RUNNING, FINISHED -- followed by a ':' and then the StateMachineProcedure state: e.g. SimpleStateMachineProcedure state=RUNNABLE:SERVER_CRASH_ASSIGN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner
[ https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612699#comment-14612699 ] Dave Latham commented on HBASE-13267: - To be fair, I'm not suggesting removing it - I left it in there to begin with for the same reason you mentioned in your last comment. I was just providing a way to do it if desired. Deprecate or remove isFileDeletable from SnapshotHFileCleaner - Key: HBASE-13267 URL: https://issues.apache.org/jira/browse/HBASE-13267 Project: HBase Issue Type: Task Reporter: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 The isFileDeletable method in SnapshotHFileCleaner became vestigial after HBASE-12627, lets remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612713#comment-14612713 ] Enis Soztutar commented on HBASE-13832: --- bq. we know that we must die there. why not exit from that loop? We can call {{stop(true)}} directly from Sync loop it is fine. It was not there in your original patch, that is why I did not change it. bq. with the actual implementation of abort we know that running will be false after a sendAbortProcessSignal() but that may not be the case in the future The store can cause an abort to the whole procedure executor or the master itself. Right now, it does this through the ProcedureStoreListener calls. I'm fine with sending an {{Abortable}} directly to the store. These parts are mainly coming from the intiail proc v2 patch. Does the test rely on 1s / 2s timing? It may end up being flaky in slow jenkins hosts. Other than that +1 for the v4 patch. If you want to do the abort changes, we can do it here or a follow up. Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14017: --- Description: [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} was: [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - tryWrite() acquire the lock, before set wlock=true; Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock {noformat} Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612787#comment-14612787 ] Matteo Bertozzi commented on HBASE-14017: - not every runqueue need a tryExclusiveLock()/releaseLock() logic. but everyone must be able to lock to prevent operations on when delete is in progress that's the main reason the acquireDeleteLock() is exposed, and there is nothing else like a release, the fact that is implemented as a tryExclusiveLock() is just a coincidence. Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612803#comment-14612803 ] Ted Yu commented on HBASE-12596: {code} 212 if (null == tableName || tableName.isEmpty()) { 213 LOG.warn(table name is null, so use default writer); {code} When would the above happen ? Table name is set in configureIncrementalLoad(), right ? {code} 218 Connection connection = ConnectionFactory.createConnection(conf); 219 RegionLocator locator = connection.getRegionLocator(TableName.valueOf(tableName)); {code} You can use try-with-resources. {code} 231 if (null == loc) { 232 LOG.warn(failed to get region location, so use default writer); {code} Should the log level be at TRACE ? bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Fix For: 0.98.14 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled=true' to enable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions
[ https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612667#comment-14612667 ] Lars Hofhansl commented on HBASE-12988: --- Good to commit this way? [Replication]Parallel apply edits across regions Key: HBASE-12988 URL: https://issues.apache.org/jira/browse/HBASE-12988 Project: HBase Issue Type: Improvement Components: Replication Reporter: hongyu bi Assignee: Lars Hofhansl Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt we can apply edits to slave cluster in parallel on table-level to speed up replication . update : per conversation blow , it's better to apply edits on row-level in parallel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-13832: Attachment: (was: HBASE-13832-v4.patch) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-13832: Attachment: HBASE-13832-v4.patch v4 is Enis patch with a fix and a new test for a case with queued writers that were hanging. still, that catched exception in the syncLoop() and the loop still going instead of aborting or at least spinning until !isRunning() seems strange to me. we know that we must die there. why not exit from that loop? (with the actual implementation of abort we know that running will be false after a sendAbortProcessSignal() but that may not be the case in the future) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-13832: Attachment: HBASE-13832-v4.patch Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-13867 started by Gaurav Bhardwaj. --- Add endpoint coprocessor guide to HBase book Key: HBASE-13867 URL: https://issues.apache.org/jira/browse/HBASE-13867 Project: HBase Issue Type: Task Components: Coprocessors, documentation Reporter: Vladimir Rodionov Assignee: Gaurav Bhardwaj Endpoint coprocessors are very poorly documented. Coprocessor section of HBase book must be updated either with its own endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some other guides. There is good description here: http://www.3pillarglobal.com/insights/hbase-coprocessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner
[ https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612701#comment-14612701 ] Mikhail Antonov commented on HBASE-13267: - Oh yeah, sorry I misinterpreted your comment a bit I guess. I meant if we want to remove it, we can do it the way you described. Deprecate or remove isFileDeletable from SnapshotHFileCleaner - Key: HBASE-13267 URL: https://issues.apache.org/jira/browse/HBASE-13267 Project: HBase Issue Type: Task Reporter: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 The isFileDeletable method in SnapshotHFileCleaner became vestigial after HBASE-12627, lets remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13480) ShortCircuitConnection doesn't short-circuit all calls as expected
[ https://issues.apache.org/jira/browse/HBASE-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612700#comment-14612700 ] Enis Soztutar commented on HBASE-13480: --- I think Josh is right. The ConnectionAdapter being a proxy is the problem. If it had extended the ConnectionImplementation, it would have worked. The net result is that we are not doing short circuit connections at all. ShortCircuitConnection doesn't short-circuit all calls as expected -- Key: HBASE-13480 URL: https://issues.apache.org/jira/browse/HBASE-13480 Project: HBase Issue Type: Bug Components: Client Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Noticed the following situation in debugging unexpected unit tests failures in HBASE-13351. {{ConnectionUtils#createShortCircuitHConnection(Connection, ServerName, AdminService.BlockingInterface, ClientService.BlockingInterface)}} is intended to avoid the extra RPC by calling the server's instantiation of the protobuf rpc stub directly for the AdminService and ClientService. The problem is that this is insufficient to actually avoid extra remote RPCs as all other calls to the Connection are routed to a real Connection instance. As such, any object created by the real Connection (such as an HTable) will use the real Connection, not the SSC. The end result is that {{MasterRpcService#reportRegionStateTransition(RpcController, ReportRegionStateTransitionRequest)}} will make additional remote RPCs over what it thinks is an SSC through a {{Get}} on {{HTable}} which was constructed using the SSC, but the {{Get}} itself will use the underlying real Connection instead of the SSC. With insufficiently sized thread pools, this has been observed to result in RPC deadlock in the HMaster where an RPC attempts to make another RPC but there are no more threads available to service the second RPC so the first RPC blocks indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13480) ShortCircuitConnection doesn't short-circuit all calls as expected
[ https://issues.apache.org/jira/browse/HBASE-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13480: -- Fix Version/s: (was: 1.0.2) 1.0.3 ShortCircuitConnection doesn't short-circuit all calls as expected -- Key: HBASE-13480 URL: https://issues.apache.org/jira/browse/HBASE-13480 Project: HBase Issue Type: Bug Components: Client Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Noticed the following situation in debugging unexpected unit tests failures in HBASE-13351. {{ConnectionUtils#createShortCircuitHConnection(Connection, ServerName, AdminService.BlockingInterface, ClientService.BlockingInterface)}} is intended to avoid the extra RPC by calling the server's instantiation of the protobuf rpc stub directly for the AdminService and ClientService. The problem is that this is insufficient to actually avoid extra remote RPCs as all other calls to the Connection are routed to a real Connection instance. As such, any object created by the real Connection (such as an HTable) will use the real Connection, not the SSC. The end result is that {{MasterRpcService#reportRegionStateTransition(RpcController, ReportRegionStateTransitionRequest)}} will make additional remote RPCs over what it thinks is an SSC through a {{Get}} on {{HTable}} which was constructed using the SSC, but the {{Get}} itself will use the underlying real Connection instead of the SSC. With insufficiently sized thread pools, this has been observed to result in RPC deadlock in the HMaster where an RPC attempts to make another RPC but there are no more threads available to service the second RPC so the first RPC blocks indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612737#comment-14612737 ] Stephen Yuan Jiang commented on HBASE-14017: We should move the tryExclusiveLock() up: {code} public synchronized boolean tryExclusiveLock(final TableLockManager lockManager, final TableName tableName, final String purpose) { if (tryExclusiveLock()) return false; //== // Take zk-write-lock tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); } catch (IOException e) { LOG.error(failed acquire write lock on + tableName, e); tableLock = null; releaseExclusiveLock(); // == return false; } return true; } Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - tryWrite() acquire the lock, before set wlock=true; Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apekshit Sharma updated HBASE-13702: Attachment: HBASE-13702-branch-1-v3.patch [~tedyu] fixed the test. So one of the test was failing because an existing test was directly changing global configuration (util.getConfiguration()) in its test body which affected any tests that ran later. :-/ ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13702-branch-1-v2.patch, HBASE-13702-branch-1-v3.patch, HBASE-13702-branch-1.patch, HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13561: -- Fix Version/s: (was: 1.0.2) 1.0.3 ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13605) RegionStates should not keep its list of dead servers
[ https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13605: -- Fix Version/s: (was: 1.0.2) RegionStates should not keep its list of dead servers - Key: HBASE-13605 URL: https://issues.apache.org/jira/browse/HBASE-13605 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 2.0.0, 1.1.2 Attachments: hbase-13605_v1.patch, hbase-13605_v3-branch-1.1.patch, hbase-13605_v4-branch-1.1.patch, hbase-13605_v4-master.patch As mentioned in https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761 and HBASE-12844 we should have only 1 source of cluster membership. The list of dead server and RegionStates doing it's own liveliness check (ServerManager.isServerReachable()) has caused an assignment problem again in a test cluster where the region states thinks that the server is dead and SSH will handle the region assignment. However the RS is not dead at all, living happily, and never gets zk expiry or YouAreDeadException or anything. This leaves the list of regions unassigned in OFFLINE state. master assigning the region: {code} 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: Onlined 77dddcd50c22e56bfff133c0e1f9165b on os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED = 77dddcd50c {code} Master then disabled the table, and unassigned the region: {code} 2015-04-20 09:02:27,158 WARN [ProcedureExecutorThread-1] zookeeper.ZKTableStateManager: Moving table loadtest_d1 state from DISABLING to DISABLING Starting unassign of loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining), current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, ts=1429520545780, server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268} bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 for region loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. 2015-04-20 09:02:27,414 INFO [AM.ZK.Worker-pool3-t316] master.RegionStates: Offlined 77dddcd50c22e56bfff133c0e1f9165b from os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {code} On table re-enable, AM does not assign the region: {code} 2015-04-20 09:02:30,415 INFO [ProcedureExecutorThread-3] balancer.BaseLoadBalancer: Reassigned 25 regions. 25 retained the pre-restart assignment.· 2015-04-20 09:02:30,415 INFO [ProcedureExecutorThread-3] procedure.EnableTableProcedure: Bulk assigning 25 region(s) across 5 server(s), retainAssignment=true l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't reach online server os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: Updating the state to OFFLINE to allow to be reassigned by SSH nmentManager: Skip assigning loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b., it is on a dead but not processed yet server: os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely
[ https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612715#comment-14612715 ] Stephen Yuan Jiang commented on HBASE-14016: [~mbertozzi] I think we should do something like the following: {code} public boolean tryAcquireTableWrite(final TableName table, final String purpose) { boolean lockAcquired = false; lock.lock(); try { lockAcquired = getRunQueueOrCreate(table).tryWrite(lockManager, table, purpose); } finally { lock.unlock(); } return lockAcquired; } {code} Procedure V2: NPE in a delete table follow by create table closely -- Key: HBASE-14016 URL: https://issues.apache.org/jira/browse/HBASE-14016 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues {noformat} Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674) {noformat} Here is the code that cause the race condition: {code} protected boolean markTableAsDeleted(final TableName table) { TableRunQueue queue = getRunQueue(table); if (queue != null) { ... if (queue.isEmpty() !queue.isLocked()) { fairq.remove(table); ... } public boolean tryWrite(final TableLockManager lockManager, final TableName tableName, final String purpose) { ... tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); ... wlock = true; ... } {code} The root cause is: wlock is set too late and not protect the queue be deleted. - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false) - Thread 2: markTableAsDeleted see the queue empty and wlock= false - Thread 1: set wlock=true - too late - Thread 2: delete the queue - Thread 1: never able to release the lock - NPE trying to get queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-14017: Attachment: HBASE-14017-v0.patch Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - tryWrite() acquire the lock, before set wlock=true; Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14011) MultiByteBuffer position based reads does not work correctly
[ https://issues.apache.org/jira/browse/HBASE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-14011: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to master. Thanks for the reviews. MultiByteBuffer position based reads does not work correctly Key: HBASE-14011 URL: https://issues.apache.org/jira/browse/HBASE-14011 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-14011.patch The positional based reads in MBBs are having some issues when we try to read the first element from the 2nd BB when the MBB is formed with multiple BBs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Xu updated HBASE-12596: -- Description: Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled' to enable it. was:Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled' to enable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Xu updated HBASE-12596: -- Attachment: HBASE-12596-master-v1.patch HBASE-12596-0.98-v1.patch Add patches for both 0.98 and master branches. This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled' to enable it. bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13506) AES-GCM cipher support where available
[ https://issues.apache.org/jira/browse/HBASE-13506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13506: -- Fix Version/s: (was: 1.0.2) 1.0.3 AES-GCM cipher support where available -- Key: HBASE-13506 URL: https://issues.apache.org/jira/browse/HBASE-13506 Project: HBase Issue Type: Sub-task Components: encryption, security Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 The initial encryption drop only had AES-CTR support because authenticated modes such as GCM are only available in Java 7 and up, and our trunk at the time was targeted at Java 6. However we can optionally use AES-GCM cipher support where available. For HBase 1.0 and up, Java 7 is now the minimum so use of AES-GCM can go in directly. It's probably possible to add support in 0.98 too using reflection for cipher object initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13511) Derive data keys with HKDF
[ https://issues.apache.org/jira/browse/HBASE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13511: -- Fix Version/s: (was: 1.0.2) 1.0.3 Derive data keys with HKDF -- Key: HBASE-13511 URL: https://issues.apache.org/jira/browse/HBASE-13511 Project: HBase Issue Type: Sub-task Components: encryption, security Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 When we are locally managing master key material, when users have supplied their own data key material, derive the actual data keys using HKDF (https://tools.ietf.org/html/rfc5869) DK' = HKDF(S, DK, MK) where S = salt DK = user supplied data key MK = master key DK' = derived data key for the HFile User supplied key material may be weak or an attacker may have some partial knowledge of it. Where we generate random data keys we can still use HKDF as a way to mix more entropy into the secure random generator. DK' = HKDF(R, MK) where R = random key material drawn from the system's secure random generator MK = master key (Salting isn't useful here because salt S and R would be drawn from the same pool, so will not have statistical independence.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13347) RowCounter using special filter is broken
[ https://issues.apache.org/jira/browse/HBASE-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13347: -- Fix Version/s: (was: 1.0.2) 1.0.3 RowCounter using special filter is broken - Key: HBASE-13347 URL: https://issues.apache.org/jira/browse/HBASE-13347 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.0.0 Reporter: Lars George Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3 The {{RowCounter}} in the {{mapreduce}} package is supposed to check if the row count scan has a column selection added to it, and if so, use a different filter that finds the row and counts it. But the {{qualifier.add()}} call is missing in the {{for}} loop. See https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java#L165 Needs fixing or row count might be wrong when using {{--range}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13352) Add hbase.import.version to Import usage.
[ https://issues.apache.org/jira/browse/HBASE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13352: -- Attachment: hbase-13352_v3.patch rebased the patch. Will commit this version unless objection. It does not exit with non-zero in case of output != input. Just a warning. Add hbase.import.version to Import usage. - Key: HBASE-13352 URL: https://issues.apache.org/jira/browse/HBASE-13352 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 2.0.0, 0.98.14, 1.0.2, 1.1.2, 1.3.0, 1.2.1 Attachments: 13352-v2.txt, 13352.txt, hbase-13352_v3.patch We just tried to export some (small amount of) data out of an 0.94 cluster to 0.98 cluster. We used Export/Import for that. By default we found that the import M/R job correctly reports the number of records seen, but _silently_ does not import anything. After looking at the 0.98 it's obvious there's an hbase.import.version (-Dhbase.import.version=0.94) to make this work. Two issues: # -Dhbase.import.version=0.94 should be show with the the Import.usage # If not given it should not just silently not import anything In this issue I'll just a trivially add this option to the Import tool's usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14015) Allow setting a richer state value when toString a pv2
[ https://issues.apache.org/jira/browse/HBASE-14015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14015: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to branch-1, branch-1.2 and master ([~busbey] FYI -- helps debugging sir). Allow setting a richer state value when toString a pv2 -- Key: HBASE-14015 URL: https://issues.apache.org/jira/browse/HBASE-14015 Project: HBase Issue Type: Improvement Components: proc-v2 Reporter: stack Assignee: stack Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 0001-HBASE-14015-Allow-setting-a-richer-state-value-when-.patch Debugging, my procedure after a crash was loaded out of the store and its state was RUNNING. It would help if I knew in which of the states of a StateMachineProcedure it was going to start RUNNING at. Chatting w/ Matteo, he suggested allowing Procedures customize the String. Here is patch that makes it so StateMachineProcedure will now print out the base state -- RUNNING, FINISHED -- followed by a ':' and then the StateMachineProcedure state: e.g. SimpleStateMachineProcedure state=RUNNABLE:SERVER_CRASH_ASSIGN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely
[ https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14016: --- Affects Version/s: (was: 2.0.0) Procedure V2: NPE in a delete table follow by create table closely -- Key: HBASE-14016 URL: https://issues.apache.org/jira/browse/HBASE-14016 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 1.2.0, 1.1.1, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues {noformat} Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674) {noformat} Here is the code that cause the race condition: {code} protected boolean markTableAsDeleted(final TableName table) { TableRunQueue queue = getRunQueue(table); if (queue != null) { ... if (queue.isEmpty() !queue.isLocked()) { fairq.remove(table); ... } public boolean tryWrite(final TableLockManager lockManager, final TableName tableName, final String purpose) { ... tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); ... wlock = true; ... } {code} The root cause is: wlock is set too late and not protect the queue be deleted. - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false) - Thread 2: markTableAsDeleted see the queue empty and wlock= false - Thread 1: set wlock=true - too late - Thread 2: delete the queue - Thread 1: never able to release the lock - NPE trying to get queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely
Stephen Yuan Jiang created HBASE-14016: -- Summary: Procedure V2: NPE in a delete table follow by create table closely Key: HBASE-14016 URL: https://issues.apache.org/jira/browse/HBASE-14016 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 1.1.1, 2.0.0, 1.2.0, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues {noformat} Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674) {noformat} Here is the code that cause the race condition: {code} protected boolean markTableAsDeleted(final TableName table) { TableRunQueue queue = getRunQueue(table); if (queue != null) { ... if (queue.isEmpty() !queue.isLocked()) { fairq.remove(table); ... } public boolean tryWrite(final TableLockManager lockManager, final TableName tableName, final String purpose) { ... tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); ... wlock = true; ... } {code} The root cause is: wlock is set too late and not protect the queue be deleted. - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false) - Thread 2: markTableAsDeleted see the queue empty and wlock= false - Thread 1: set wlock=true - too late - Thread 2: delete the queue - Thread 1: never able to release the lock - NPE trying to get queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely
[ https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi resolved HBASE-14016. - Resolution: Duplicate sorry closing as duplicate of HBASE-14017 (we don't need a full lock) Procedure V2: NPE in a delete table follow by create table closely -- Key: HBASE-14016 URL: https://issues.apache.org/jira/browse/HBASE-14016 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues {noformat} Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674) {noformat} Here is the code that cause the race condition: {code} protected boolean markTableAsDeleted(final TableName table) { TableRunQueue queue = getRunQueue(table); if (queue != null) { ... if (queue.isEmpty() !queue.isLocked()) { fairq.remove(table); ... } public boolean tryWrite(final TableLockManager lockManager, final TableName tableName, final String purpose) { ... tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); ... wlock = true; ... } {code} The root cause is: wlock is set too late and not protect the queue be deleted. - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false) - Thread 2: markTableAsDeleted see the queue empty and wlock= false - Thread 1: set wlock=true - too late - Thread 2: delete the queue - Thread 1: never able to release the lock - NPE trying to get queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612722#comment-14612722 ] Matteo Bertozzi commented on HBASE-13832: - calling directly stop() was not what I was proposing. what I was saying was just exiting from the syncLoop(). before with the while (isRunning()) we were spinning after the signal, to make clear that there were no other run of the syncLoop(). in this case we may do another round of the loop and execute stuff which in theory is not what you expect after sending the abort signal. the test does not rely on the 1s/2s timing, it passes even without. but I was trying to make the problem more clear to someone looking the code. Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HBASE-13832-v4.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612736#comment-14612736 ] Stephen Yuan Jiang commented on HBASE-14017: [~mbertozzi] This would not solve the problem, because tryExclusiveLock would not check the result of 'tryExclusiveLock()'. {code} public synchronized boolean tryExclusiveLock(final TableLockManager lockManager, final TableName tableName, final String purpose) { if (isLocked()) return false; // Take zk-write-lock tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); } catch (IOException e) { LOG.error(failed acquire write lock on + tableName, e); tableLock = null; return false; } tryExclusiveLock(); return true; } {code} Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - tryWrite() acquire the lock, before set wlock=true; Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612766#comment-14612766 ] Matteo Bertozzi commented on HBASE-14017: - it does not make any difference, we are under synchronized. we check for isLocked() so that tryLock will always lock successfully, not the best looking thing ever but correct anyway [~stack] Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - tryWrite() acquire the lock, before set wlock=true; Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Xu reassigned HBASE-12596: - Assignee: Victor Xu bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Attachments: HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13941) Backport HBASE-13917 (Remove string comparison to identify request priority) to release branches
[ https://issues.apache.org/jira/browse/HBASE-13941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13941: -- Fix Version/s: (was: 1.0.2) 1.0.3 Backport HBASE-13917 (Remove string comparison to identify request priority) to release branches Key: HBASE-13941 URL: https://issues.apache.org/jira/browse/HBASE-13941 Project: HBase Issue Type: Task Reporter: Andrew Purtell Fix For: 0.98.14, 1.1.2, 1.0.3 Backport HBASE-13917 (Remove string comparison to identify request priority) to release branches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13988) Add exception handler for lease thread
[ https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612656#comment-14612656 ] Enis Soztutar commented on HBASE-13988: --- The patch aborts the RS with the throwable which will get logged as well, no? {code} uncaughtExceptionHandler = new UncaughtExceptionHandler() { @Override public void uncaughtException(Thread t, Throwable e) { abort(Uncaught exception in service thread + t.getName(), e); } }; ... public void abort(String reason, Throwable cause) { String msg = ABORTING region server + this + : + reason; if (cause != null) { LOG.fatal(msg, cause); } else { LOG.fatal(msg); } {code} We were already aborting the RS in case leases thread dies, so it does not change the semantics. +1. Add exception handler for lease thread -- Key: HBASE-13988 URL: https://issues.apache.org/jira/browse/HBASE-13988 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0, 1.0.2, 1.1.2, 0.98.15 Attachments: HBASE-13988-v001.diff In a prod cluster, a region server exited for some important threads were not alive. After excluding other threads from the log, we doubted the lease thread was the root. So we need to add an exception handler to the lease thread to debug why it exited in future. {quote} 2015-06-29,12:46:09,222 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more threads are no longer alive -- stop 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 21600 ... 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver21600.compactionChecker exiting 2015-06-29,12:46:12,403 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: regionserver21600.periodicFlusher exiting {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13452) HRegion warning about memstore size miscalculation is not actionable
[ https://issues.apache.org/jira/browse/HBASE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13452: -- Fix Version/s: (was: 1.0.2) 1.0.3 HRegion warning about memstore size miscalculation is not actionable Key: HBASE-13452 URL: https://issues.apache.org/jira/browse/HBASE-13452 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Dev Lakhani Assignee: Mikhail Antonov Priority: Critical Fix For: 2.0.0, 1.1.2, 1.2.1, 1.0.3 During normal operation the HRegion class reports a message related to memstore flushing in HRegion.class : if (!canFlush) { addAndGetGlobalMemstoreSize(-memstoreSize.get()); } else if (memstoreSize.get() != 0) { LOG.error(Memstore size is + memstoreSize.get()); } The log file is filled with lots of Memstore size is 558744 Memstore size is 4390632 Memstore size is 558744 ... These message are uninformative, clog up the logs and offers no root cause nor solution. Maybe the message needs to be more informative, changed to WARN or some further information provided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner
[ https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13267: -- Fix Version/s: (was: 1.0.2) 1.0.3 Deprecate or remove isFileDeletable from SnapshotHFileCleaner - Key: HBASE-13267 URL: https://issues.apache.org/jira/browse/HBASE-13267 Project: HBase Issue Type: Task Reporter: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 The isFileDeletable method in SnapshotHFileCleaner became vestigial after HBASE-12627, lets remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13221) HDFS Transparent Encryption breaks WAL writing
[ https://issues.apache.org/jira/browse/HBASE-13221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13221: -- Fix Version/s: (was: 1.0.2) 1.0.3 HDFS Transparent Encryption breaks WAL writing -- Key: HBASE-13221 URL: https://issues.apache.org/jira/browse/HBASE-13221 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.98.0, 1.0.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Critical Fix For: 2.0.0, 0.98.14, 1.1.2, 1.0.3 We need to detect when HDFS Transparent Encryption (Hadoop 2.6.0+) is enabled and fall back to more synchronization in the WAL to prevent catastrophic failure under load. See HADOOP-11708 for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13271) Table#puts(ListPut) operation is indeterminate; needs fixing
[ https://issues.apache.org/jira/browse/HBASE-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13271: -- Fix Version/s: (was: 1.0.2) 1.0.3 Table#puts(ListPut) operation is indeterminate; needs fixing -- Key: HBASE-13271 URL: https://issues.apache.org/jira/browse/HBASE-13271 Project: HBase Issue Type: Improvement Components: API Affects Versions: 1.0.0 Reporter: stack Priority: Critical Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Another API issue found by [~larsgeorge]: Table.put(ListPut) is questionable after the API change. {code} [Mar-17 9:21 AM] Lars George: Table.put(ListPut) is weird since you cannot flush partial lists [Mar-17 9:21 AM] Lars George: Say out of 5 the third is broken, then the put() call returns with a local exception (say empty Put) and then you have 2 that are in the buffer [Mar-17 9:21 AM] Lars George: but how to you force commit them? [Mar-17 9:22 AM] Lars George: In the past you would call flushCache(), but that is gone now [Mar-17 9:22 AM] Lars George: and flush() is not available on a Table [Mar-17 9:22 AM] Lars George: And you cannot access the underlying BufferedMutation neither [Mar-17 9:23 AM] Lars George: You can *only* add more Puts if you can, or call close() [Mar-17 9:23 AM] Lars George: that is just weird to explain {code} So, Table needs to get flush back or we deprecate this method or it flushes immediately and does not return until complete in the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13267) Deprecate or remove isFileDeletable from SnapshotHFileCleaner
[ https://issues.apache.org/jira/browse/HBASE-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612695#comment-14612695 ] Mikhail Antonov commented on HBASE-13267: - I guess there's some action to be taken here.. Should we do as Dave suggested and proceeed with removing it? I can take this one. Deprecate or remove isFileDeletable from SnapshotHFileCleaner - Key: HBASE-13267 URL: https://issues.apache.org/jira/browse/HBASE-13267 Project: HBase Issue Type: Task Reporter: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 The isFileDeletable method in SnapshotHFileCleaner became vestigial after HBASE-12627, lets remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely
[ https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14016: --- Affects Version/s: 2.0.0 Procedure V2: NPE in a delete table follow by create table closely -- Key: HBASE-14016 URL: https://issues.apache.org/jira/browse/HBASE-14016 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues {noformat} Exception in thread ProcedureExecutorThread-0 java.lang.NullPointerException at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674) {noformat} Here is the code that cause the race condition: {code} protected boolean markTableAsDeleted(final TableName table) { TableRunQueue queue = getRunQueue(table); if (queue != null) { ... if (queue.isEmpty() !queue.isLocked()) { fairq.remove(table); ... } public boolean tryWrite(final TableLockManager lockManager, final TableName tableName, final String purpose) { ... tableLock = lockManager.writeLock(tableName, purpose); try { tableLock.acquire(); ... wlock = true; ... } {code} The root cause is: wlock is set too late and not protect the queue be deleted. - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false) - Thread 2: markTableAsDeleted see the queue empty and wlock= false - Thread 1: set wlock=true - too late - Thread 2: delete the queue - Thread 1: never able to release the lock - NPE trying to get queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Status: Open (was: Patch Available) Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Attachment: HBASE-13977_4.patch Try QA. Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13977) Convert getKey and related APIs to Cell
[ https://issues.apache.org/jira/browse/HBASE-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13977: --- Status: Patch Available (was: Open) Convert getKey and related APIs to Cell --- Key: HBASE-13977 URL: https://issues.apache.org/jira/browse/HBASE-13977 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-13977.patch, HBASE-13977_1.patch, HBASE-13977_2.patch, HBASE-13977_3.patch, HBASE-13977_4.patch, HBASE-13977_4.patch During the course of changes for HBASE-11425 felt that more APIs can be converted to return Cell instead of BB like getKey, getLastKey. We can also rename the getKeyValue to getCell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612780#comment-14612780 ] Stephen Yuan Jiang commented on HBASE-14017: +1 - it should work. One thing is that you can just use 'tryExclusiveLock()' (expose it in interface) instead of creating a new 'acquireDeleteLock()'. Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.0.1 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
[ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612781#comment-14612781 ] Ted Yu commented on HBASE-13702: Waiting for Jenkins to come back so that QA can test the patch. ImportTsv: Add dry-run functionality and log bad rows - Key: HBASE-13702 URL: https://issues.apache.org/jira/browse/HBASE-13702 Project: HBase Issue Type: New Feature Reporter: Apekshit Sharma Assignee: Apekshit Sharma Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13702-branch-1-v2.patch, HBASE-13702-branch-1-v3.patch, HBASE-13702-branch-1.patch, HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch, HBASE-13702-v5.patch, HBASE-13702.patch ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Xu updated HBASE-12596: -- Description: Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled=true' to enable it. was: Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled' to enable it. bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Fix For: 0.98.14 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-master-v1.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is disabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled=true' to enable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)