[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146066#comment-13146066 ] Hudson commented on HBASE-4552: --- Integrated in HBase-0.92 #119 (See [https://builds.apache.org/job/HBase-0.92/119/]) HBASE-4740 [bulk load] the HBASE-4552 API can't tell if errors on region server are recoverable (Jonathan Hsieh) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.92/src/main/resources/hbase-default.xml * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146100#comment-13146100 ] Hudson commented on HBASE-4552: --- Integrated in HBase-TRUNK #2420 (See [https://builds.apache.org/job/HBase-TRUNK/2420/]) HBASE-4740 [bulk load] the HBASE-4552 API can't tell if errors on region server are recoverable (Jonathan Hsieh) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/resources/hbase-default.xml * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140212#comment-13140212 ] ramkrishna.s.vasudevan commented on HBASE-4552: --- In test case TestLoadIncrementalHFilesSplitRecovery new HTable(tableName) is getting used. {code} + LOG.info(Creating table + table); + HTableDescriptor htd = new HTableDescriptor(table); {code} As part of HBASE-4253 it was found that using new HTableDescriptor(conf, tablename) is the best way. Also check HBASE-4138(comment 25/Aug/11 09:19) for reference. This will prevent the failure that happened in https://builds.apache.org/job/PreCommit-HBASE-Build/99/testReport/org.apache.hadoop.hbase.mapreduce/TestLoadIncrementalHFilesSplitRecovery/testBulkLoadPhaseRecovery/ Correct me if am wrong. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140317#comment-13140317 ] Jonathan Hsieh commented on HBASE-4552: --- @Ram, on trunk or 0.92 branches, HTableDescriptor(conf,tablename) doesn't seem to be in the api. In patch v4, it seems like all the HTable constructors have been updated to explicitly take a the configuration reference. I'm assuming you meant HTable? multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140344#comment-13140344 ] Ted Yu commented on HBASE-4552: --- Integrated to 0.92 and TRUNK. Thanks for the patch Jonathan. Thanks for the review Todd. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140367#comment-13140367 ] ramkrishna.s.vasudevan commented on HBASE-4552: --- @Jon Yes.. i was wrong...Sorry for that. So may be will chk the reason for failure once again..Thanks Jon multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140584#comment-13140584 ] Hudson commented on HBASE-4552: --- Integrated in HBase-TRUNK #2392 (See [https://builds.apache.org/job/HBase-TRUNK/2392/]) HBASE-4552 multi-CF bulk load is not atomic across column families (Jonathan Hsieh) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/src/main/resources/hbase-default.xml * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140834#comment-13140834 ] Hudson commented on HBASE-4552: --- Integrated in HBase-0.92 #90 (See [https://builds.apache.org/job/HBase-0.92/90/]) HBASE-4552 multi-CF bulk load is not atomic across column families (Jonathan Hsieh) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.92/src/main/resources/hbase-default.xml * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139136#comment-13139136 ] Hadoop QA commented on HBASE-4552: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501415/hbase-4552.consolidated.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/95//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/95//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/95//console This message is automatically generated. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139398#comment-13139398 ] Hadoop QA commented on HBASE-4552: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501459/hbase-4552.consolidated.v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/99//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/99//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/99//console This message is automatically generated. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139509#comment-13139509 ] Hadoop QA commented on HBASE-4552: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501481/hbase-4552.consolidated.v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithRemove Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/101//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/101//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/101//console This message is automatically generated. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139533#comment-13139533 ] Ted Yu commented on HBASE-4552: --- I verified that except for TestRegionServerCoprocessorExceptionWithXXX tests, the other failures were caused by too many open files. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch, hbase-4552.consolidated.v3.patch, hbase-4552.consolidated.v4.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138918#comment-13138918 ] Ted Yu commented on HBASE-4552: --- After applying the consolidated patch, I got: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /Users/zhihyu/trunk-hbase/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java:[190,34] cannot find symbol [ERROR] symbol : method getTestDir(java.lang.String) [ERROR] location: class org.apache.hadoop.hbase.HBaseTestingUtility [ERROR] [ERROR] /Users/zhihyu/trunk-hbase/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java:[216,34] cannot find symbol [ERROR] symbol : method getTestDir(java.lang.String) [ERROR] location: class org.apache.hadoop.hbase.HBaseTestingUtility [ERROR] [ERROR] /Users/zhihyu/trunk-hbase/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java:[309,36] cannot find symbol [ERROR] symbol : method getTestDir(java.lang.String) [ERROR] location: class org.apache.hadoop.hbase.HBaseTestingUtility [ERROR] [ERROR] /Users/zhihyu/trunk-hbase/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java:[123,36] cannot find symbol [ERROR] symbol : method getTestDir(java.lang.String) [ERROR] location: class org.apache.hadoop.hbase.HBaseTestingUtility {code} For TRUNK, there is no such error. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138977#comment-13138977 ] Jonathan Hsieh commented on HBASE-4552: --- This was due to HBASE-4634 which got committed two days ago. The old getTestDir was a public method and apparently was just removed. This will probably break on trunk as well. https://github.com/apache/hbase/commit/ed21cd6c4c266f610352d76d3d4b6f5cff492a97#src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java I think this should be replaced with getDataTestDir calls (thats what the old bulk load test calls to getTestDir were changed to). multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139074#comment-13139074 ] Hadoop QA commented on HBASE-4552: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501415/hbase-4552.consolidated.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.master.TestDefaultLoadBalancer org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/92//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/92//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/92//console This message is automatically generated. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Attachments: hbase-4552.consolidated.patch, hbase-4552.consolidated.v2.patch Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134138#comment-13134138 ] Jonathan Hsieh commented on HBASE-4552: --- Created recovery mechanism jira at HBASE-4652 multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133677#comment-13133677 ] Jonathan Hsieh commented on HBASE-4552: --- One more piece: Mechanism to atomically rollback if a partial failures encountered when attempting to bulk load multiple families. For example, let's say I want to bulk load a region with cfs A, B, C. I issue a call to an RS region to atomically bulkload the HFiles. The RS loads A and B successfully but fails on C (hdfs failure, or rs goes down, etc). We should rollback A and B -- if we don't we would have A and B loaded but not C and have an atomicity violation. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133679#comment-13133679 ] Ted Yu commented on HBASE-4552: --- In case of such faulty condition (hdfs failure), would it be easier if we record which column families encountered error and retry loading them after faulty condition recovers ? multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133696#comment-13133696 ] Jonathan Hsieh commented on HBASE-4552: --- If we have an hdfs failure, we won't be able to record or update information about what failed. This make me think we need to journal/log the intended atomic actions. Once we have the log, we can act depending on the situation: * If we complete successfully, we remove/invalidate log and carry on. * If we fail (can't write, rs goes down and restarts), we check to see if everything is in. If it isn't we rollback the subset of hfile loads that had happened. If rollback fails, we still have the log, so we can try later or maybe we kill the RS? How about we make this a subtask/follow on jira. The first cut will just detect the situation and log error messages (similar to what currently happens). A follow-on task will discuss and add/implement a recovery mechanism? multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133701#comment-13133701 ] Ted Yu commented on HBASE-4552: --- It is fine to implement recovery in another JIRA. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133190#comment-13133190 ] Jonathan Hsieh commented on HBASE-4552: --- Plan 1) Test to show there is an atomicity problem. Likely just does not use LoadIncrementalHFiles 2) Fix for the region server side. 3) Rewrite of LoadIncrementalHFiles so that it groups the proper HFiles into the new bulkLoadHFile calls. This will likely have two parallel steps - the first gather enough info to group HFiles and then the second that attempts to bulk load. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Jonathan Hsieh Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129013#comment-13129013 ] Todd Lipcon commented on HBASE-4552: The trick is making sure it's atomic inside the region server - not just that the client sends all of the files for a given region in one RPC. If there are any concurrent scanners, then they should either see all of the new data or none of the new data on a given row. So we need some region-wide coordination. I think probably we have to take a write-lock on HRegion#lock multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125987#comment-13125987 ] Ted Yu commented on HBASE-4552: --- The above optimization for reducing calls to region server can be done in a seperate JIRA. server.bulkLoadHFile() expects name of the region where HFile fits. Region name resolution needs to call conn.getRegionServerWithRetries(). multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126044#comment-13126044 ] Ted Yu commented on HBASE-4552: --- Since LoadIncrementalHFiles uses ExecutorService to achieve parallelism, we should use QueuePairbyte[], String in place of List above so that concurrent queue can be instantiated. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124506#comment-13124506 ] Ted Yu commented on HBASE-4552: --- One solution is to add the following method to HRegionInterface: {code} public void bulkLoadHFile(ListPairbyte[], String familyPaths, byte[] regionName) throws IOException; {code} familyPaths is a list of family, hfilePath pairs for the same region identified by regionName. LoadIncrementalHFiles would need to group HFiles for the same region together before calling the above method. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124511#comment-13124511 ] Todd Lipcon commented on HBASE-4552: yep, that's what I meant, but the implementation isn't quite trivial since we have to do the locking at a higher level, so the change is made visible atomically. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124521#comment-13124521 ] Ted Yu commented on HBASE-4552: --- For HRegion, we can introduce the following new method: {code} public void bulkLoadHFile(ListPairbyte[], String familyPaths) throws IOException { {code} where familyPaths is a list of family, hfilePath pairs identifying HFile path and the underlying family the HFile should be loaded to. multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4552) multi-CF bulk load is not atomic across column families
[ https://issues.apache.org/jira/browse/HBASE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13124660#comment-13124660 ] Ted Yu commented on HBASE-4552: --- Toward the end of LoadIncrementalHFiles.tryLoad() we can utilize startEndKeys of the underlying table to group (possibly split) HFiles by their first row keys. Then the new bulkLoadHFile() method would be called by doBulkLoad(). multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira