[jira] [Commented] (HBASE-14608) testWalRollOnLowReplication has some risk to assert failed after HBASE-14600
[ https://issues.apache.org/jira/browse/HBASE-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962934#comment-14962934 ] Heng Chen commented on HBASE-14608: --- {quote} if we add throw e just like you mention, sync abort exception will still interrupt this testcase {quote} {code} if (msg != null && msg.toLowerCase().contains("sync aborted") && i > 50) { return; } + throw re; } {code} Just as i said above, i think we should NOT add {{throw e}}. Otherwise it will break down this testcase, we'd better just catch "Sync aborted exception". how do you think about it? [~stack] > testWalRollOnLowReplication has some risk to assert failed after HBASE-14600 > > > Key: HBASE-14608 > URL: https://issues.apache.org/jira/browse/HBASE-14608 > Project: HBase > Issue Type: Bug >Reporter: Heng Chen >Assignee: Heng Chen > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14608.patch, HBASE-14608_v1.patch > > > After HBASE-14600, we catch runtime exception if dn recover slowly, but it > has some risk to assert failed. > For example, https://builds.apache.org/job/HBase-TRUNK/6907/testReport/ > The reason is we catch the exception, but in {{WALProcedureStore}}, it will > still stop the Procedure. So when we assert stop.isRunning, it will failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-14604: --- Attachment: HBASE-14604_98_with_ut.diff > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use maxMoves. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962949#comment-14962949 ] Hadoop QA commented on HBASE-14631: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767292/14631-branch-1.0.txt against branch-1.0 branch at commit 8e6316a80cf96f4d4cd6bd10f4c647ebf45c7e02. ATTACHMENT ID: 12767292 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16087//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16087//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16087//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16087//console This message is automatically generated. > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14420) Zombie Stomping Session
[ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962921#comment-14962921 ] Hadoop QA commented on HBASE-14420: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767283/none_fix.txt against master branch at commit f1b6355fc54616c08480c0f1b0965a244252. ATTACHMENT ID: 12767283 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16085//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16085//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16085//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16085//console This message is automatically generated. > Zombie Stomping Session > --- > > Key: HBASE-14420 > URL: https://issues.apache.org/jira/browse/HBASE-14420 > Project: HBase > Issue Type: Umbrella > Components: test >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: hangers.txt, none_fix (1).txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt > > > Patch build are now failing most of the time because we are dropping zombies. > I confirm we are doing this on non-apache build boxes too. > Left-over zombies consume resources on build boxes (OOME cannot create native > threads). Having to do multiple test runs in the hope that we can get a > non-zombie-making build or making (arbitrary) rulings that the zombies are > 'not related' is a productivity sink. And so on... > This is an umbrella issue for a zombie stomping session that started earlier > this week. Will hang sub-issues of this one. Am running builds back-to-back > on little cluster to turn out the monsters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14641) Move JDO example from Wiki to Ref Guide
[ https://issues.apache.org/jira/browse/HBASE-14641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misty Stanley-Jones updated HBASE-14641: Attachment: HBASE-14641-v1.patch Removed @author tag > Move JDO example from Wiki to Ref Guide > --- > > Key: HBASE-14641 > URL: https://issues.apache.org/jira/browse/HBASE-14641 > Project: HBase > Issue Type: Sub-task > Components: documentation >Affects Versions: 2.0.0 >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Fix For: 2.0.0 > > Attachments: HBASE-14641-v1.patch, HBASE-14641.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14600) Make #testWalRollOnLowReplication looser still
[ https://issues.apache.org/jira/browse/HBASE-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962936#comment-14962936 ] Heng Chen commented on HBASE-14600: --- {code} if (msg != null && msg.toLowerCase().contains("sync aborted") && i > 50) { return; } + throw re; } {code} Just as i said in HBASE-14608, i think we should NOT add {{throw e}}. Otherwise it will break down this testcase when i<50, we'd better just catch "Sync aborted exception". how do you think about it? [~stack] > Make #testWalRollOnLowReplication looser still > -- > > Key: HBASE-14600 > URL: https://issues.apache.org/jira/browse/HBASE-14600 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14600.addendum.txt, 14600.txt > > > The parent upped timeouts on testWalRollOnLowReplication. It still fails on > occasion. Chatting w/ [~mbertozzi], he suggested that if we've make progress > in the test, return the test as compeleted successfully if we get a > RuntimeException out of the sync call(because DN is slow to recover). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13153) Bulk Loaded HFile Replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-13153: -- Attachment: HBASE-13153-v11.patch > Bulk Loaded HFile Replication > - > > Key: HBASE-13153 > URL: https://issues.apache.org/jira/browse/HBASE-13153 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: sunhaitao >Assignee: Ashish Singhi > Fix For: 2.0.0 > > Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, > HBASE-13153-v11.patch, HBASE-13153-v2.patch, HBASE-13153-v3.patch, > HBASE-13153-v4.patch, HBASE-13153-v5.patch, HBASE-13153-v6.patch, > HBASE-13153-v7.patch, HBASE-13153-v8.patch, HBASE-13153-v9.patch, > HBASE-13153.patch, HBase Bulk Load Replication-v1-1.pdf, HBase Bulk Load > Replication-v2.pdf, HBase Bulk Load Replication.pdf > > > Currently we plan to use HBase Replication feature to deal with disaster > tolerance scenario.But we encounter an issue that we will use bulkload very > frequently,because bulkload bypass write path, and will not generate WAL, so > the data will not be replicated to backup cluster. It's inappropriate to > bukload twice both on active cluster and backup cluster. So i advise do some > modification to bulkload feature to enable bukload to both active cluster and > backup cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14600) Make #testWalRollOnLowReplication looser still
[ https://issues.apache.org/jira/browse/HBASE-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962937#comment-14962937 ] Heng Chen commented on HBASE-14600: --- {code} if (msg != null && msg.toLowerCase().contains("sync aborted") && i > 50) { return; } + throw re; } {code} Just as i said in HBASE-14608, i think we should NOT add {{throw e}}. Otherwise it will break down this testcase when i<50, we'd better just catch "Sync aborted exception". how do you think about it? [~stack] > Make #testWalRollOnLowReplication looser still > -- > > Key: HBASE-14600 > URL: https://issues.apache.org/jira/browse/HBASE-14600 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14600.addendum.txt, 14600.txt > > > The parent upped timeouts on testWalRollOnLowReplication. It still fails on > occasion. Chatting w/ [~mbertozzi], he suggested that if we've make progress > in the test, return the test as compeleted successfully if we get a > RuntimeException out of the sync call(because DN is slow to recover). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962978#comment-14962978 ] Guanghao Zhang commented on HBASE-14604: Add ut for MoveCostFunction. --- T E S T S --- Running org.apache.hadoop.hbase.master.TestMasterFailoverBalancerPersistence Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.08 sec - in org.apache.hadoop.hbase.master.TestMasterFailoverBalancerPersistence Running org.apache.hadoop.hbase.master.balancer.TestDefaultLoadBalancer Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.812 sec - in org.apache.hadoop.hbase.master.balancer.TestDefaultLoadBalancer Running org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.241 sec - in org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer Running org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.601 sec - in org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer Results : Tests run: 21, Failures: 0, Errors: 0, Skipped: 0 > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use maxMoves. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14641) Move JDO example from Wiki to Ref Guide
[ https://issues.apache.org/jira/browse/HBASE-14641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962907#comment-14962907 ] Hadoop QA commented on HBASE-14641: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767284/HBASE-14641.patch against master branch at commit f1b6355fc54616c08480c0f1b0965a244252. ATTACHMENT ID: 12767284 {color:red}-1 @author{color}. The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.thrift.TestThriftHttpServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16086//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16086//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16086//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16086//console This message is automatically generated. > Move JDO example from Wiki to Ref Guide > --- > > Key: HBASE-14641 > URL: https://issues.apache.org/jira/browse/HBASE-14641 > Project: HBase > Issue Type: Sub-task > Components: documentation >Affects Versions: 2.0.0 >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Fix For: 2.0.0 > > Attachments: HBASE-14641.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14642) Disable flakey TestMultiParallel#testActiveThreadsCount
[ https://issues.apache.org/jira/browse/HBASE-14642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962899#comment-14962899 ] Heng Chen commented on HBASE-14642: --- Any more information, I have time recently, and want to dig it. :) > Disable flakey TestMultiParallel#testActiveThreadsCount > --- > > Key: HBASE-14642 > URL: https://issues.apache.org/jira/browse/HBASE-14642 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack > Attachments: 14642.txt > > > Failed twice in a row on 1.2 build... Disabling for now Unless someone > wants to dig in and fix it that is... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13153) Bulk Loaded HFile Replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-13153: -- Attachment: (was: HBASE-13153-v11.patch) > Bulk Loaded HFile Replication > - > > Key: HBASE-13153 > URL: https://issues.apache.org/jira/browse/HBASE-13153 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: sunhaitao >Assignee: Ashish Singhi > Fix For: 2.0.0 > > Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, > HBASE-13153-v2.patch, HBASE-13153-v3.patch, HBASE-13153-v4.patch, > HBASE-13153-v5.patch, HBASE-13153-v6.patch, HBASE-13153-v7.patch, > HBASE-13153-v8.patch, HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk > Load Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk > Load Replication.pdf > > > Currently we plan to use HBase Replication feature to deal with disaster > tolerance scenario.But we encounter an issue that we will use bulkload very > frequently,because bulkload bypass write path, and will not generate WAL, so > the data will not be replicated to backup cluster. It's inappropriate to > bukload twice both on active cluster and backup cluster. So i advise do some > modification to bulkload feature to enable bukload to both active cluster and > backup cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13153) Bulk Loaded HFile Replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-13153: -- Release Note: This jira enhances the HBase replication to support replication of bulk loaded data. This is configurable, by default it is set to false which means it will not replicate the bulk loaded data to the sink(s). To enable it set "hbase.replication.bulkload.enabled" to true. As part of this we made have made following changes to LoadIncrementalHFiles class which is marked as Public and Stable class, a. Raised the visbility scope of LoadQueueItem class from package private to public. b. Added a new field splittigDir, which allows client to tell the tool in which directory it would like get hfiles split during the operation. c. Added a new method loadHFileQueue, which loads the queue of LoadQueueItem into the table as per the region keys provided. Added release note. Let me know if any update is required. > Bulk Loaded HFile Replication > - > > Key: HBASE-13153 > URL: https://issues.apache.org/jira/browse/HBASE-13153 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: sunhaitao >Assignee: Ashish Singhi > Fix For: 2.0.0 > > Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, > HBASE-13153-v11.patch, HBASE-13153-v2.patch, HBASE-13153-v3.patch, > HBASE-13153-v4.patch, HBASE-13153-v5.patch, HBASE-13153-v6.patch, > HBASE-13153-v7.patch, HBASE-13153-v8.patch, HBASE-13153-v9.patch, > HBASE-13153.patch, HBase Bulk Load Replication-v1-1.pdf, HBase Bulk Load > Replication-v2.pdf, HBase Bulk Load Replication.pdf > > > Currently we plan to use HBase Replication feature to deal with disaster > tolerance scenario.But we encounter an issue that we will use bulkload very > frequently,because bulkload bypass write path, and will not generate WAL, so > the data will not be replicated to backup cluster. It's inappropriate to > bukload twice both on active cluster and backup cluster. So i advise do some > modification to bulkload feature to enable bukload to both active cluster and > backup cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963258#comment-14963258 ] Hadoop QA commented on HBASE-14604: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767319/HBASE-14604_98_with_ut.diff against master branch at commit 8e6316a80cf96f4d4cd6bd10f4c647ebf45c7e02. ATTACHMENT ID: 12767319 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16094//console This message is automatically generated. > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 0.98.15 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use Math.min(cluster.numRegions, maxMoves) as the max > cost, so it can scale moveCost to [0,1]. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache
[ https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963215#comment-14963215 ] Yu Li commented on HBASE-14463: --- [~stack] boss, anything more to be addressed or could I get your +1 here? Thanks. > Severe performance downgrade when parallel reading a single key from > BucketCache > > > Key: HBASE-14463 > URL: https://issues.apache.org/jira/browse/HBASE-14463 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.16 > > Attachments: GC_with_WeakObjectPool.png, HBASE-14463.patch, > HBASE-14463_v11.patch, HBASE-14463_v12.patch, HBASE-14463_v2.patch, > HBASE-14463_v3.patch, HBASE-14463_v4.patch, HBASE-14463_v5.patch, > TestBucketCache-new_with_IdLock.png, > TestBucketCache-new_with_IdReadWriteLock.png, > TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, > TestBucketCache_with_IdReadWriteLock-latest.png, > TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, > TestBucketCache_with_IdReadWriteLock.png > > > We store feature data of online items in HBase, do machine learning on these > features, and supply the outputs to our online search engine. In such > scenario we will launch hundreds of yarn workers and each worker will read > all features of one item(i.e. single rowkey in HBase), so there'll be heavy > parallel reading on a single rowkey. > We were using LruCache but start to try BucketCache recently to resolve gc > issue, and just as titled we have observed severe performance downgrade. > After some analytics we found the root cause is the lock in > BucketCache#getBlock, as shown below > {code} > try { > lockEntry = offsetLock.getLockEntry(bucketEntry.offset()); > // ... > if (bucketEntry.equals(backingMap.get(key))) { > // ... > int len = bucketEntry.getLength(); > Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len, > bucketEntry.deserializerReference(this.deserialiserMap)); > {code} > Since ioEnging.read involves array copy, it's much more time-costed than the > operation in LruCache. And since we're using synchronized in > IdLock#getLockEntry, parallel read dropping on the same bucket would be > executed in serial, which causes a really bad performance. > To resolve the problem, we propose to use ReentranceReadWriteLock in > BucketCache, and introduce a new class called IdReadWriteLock to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache
[ https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-14463: -- Attachment: GC_with_WeakObjectPool.png Launched several YCSB workloadc testing against a 4 nodes dev cluster with zipfian/hotspot requestdistributio, each round running for around 20 minutes and watch the GC status with VirsualVM, the result shows no memory leak in the new implementation with WeakObjectPool. See the attached screenshot for more details. > Severe performance downgrade when parallel reading a single key from > BucketCache > > > Key: HBASE-14463 > URL: https://issues.apache.org/jira/browse/HBASE-14463 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.14, 1.1.2 >Reporter: Yu Li >Assignee: Yu Li > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.16 > > Attachments: GC_with_WeakObjectPool.png, HBASE-14463.patch, > HBASE-14463_v11.patch, HBASE-14463_v12.patch, HBASE-14463_v2.patch, > HBASE-14463_v3.patch, HBASE-14463_v4.patch, HBASE-14463_v5.patch, > TestBucketCache-new_with_IdLock.png, > TestBucketCache-new_with_IdReadWriteLock.png, > TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, > TestBucketCache_with_IdReadWriteLock-latest.png, > TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, > TestBucketCache_with_IdReadWriteLock.png > > > We store feature data of online items in HBase, do machine learning on these > features, and supply the outputs to our online search engine. In such > scenario we will launch hundreds of yarn workers and each worker will read > all features of one item(i.e. single rowkey in HBase), so there'll be heavy > parallel reading on a single rowkey. > We were using LruCache but start to try BucketCache recently to resolve gc > issue, and just as titled we have observed severe performance downgrade. > After some analytics we found the root cause is the lock in > BucketCache#getBlock, as shown below > {code} > try { > lockEntry = offsetLock.getLockEntry(bucketEntry.offset()); > // ... > if (bucketEntry.equals(backingMap.get(key))) { > // ... > int len = bucketEntry.getLength(); > Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len, > bucketEntry.deserializerReference(this.deserialiserMap)); > {code} > Since ioEnging.read involves array copy, it's much more time-costed than the > operation in LruCache. And since we're using synchronized in > IdLock#getLockEntry, parallel read dropping on the same bucket would be > executed in serial, which causes a really bad performance. > To resolve the problem, we propose to use ReentranceReadWriteLock in > BucketCache, and introduce a new class called IdReadWriteLock to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-14604: --- Status: Patch Available (was: In Progress) > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 0.98.15 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use Math.min(cluster.numRegions, maxMoves) as the max > cost, so it can scale moveCost to [0,1]. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962989#comment-14962989 ] Hadoop QA commented on HBASE-14631: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767306/14631-branch-0.98.txt against 0.98 branch at commit 8e6316a80cf96f4d4cd6bd10f4c647ebf45c7e02. ATTACHMENT ID: 12767306 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 29 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16089//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16089//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16089//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16089//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16089//console This message is automatically generated. > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-14604: --- Affects Version/s: 0.98.15 > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 0.98.15 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use Math.min(cluster.numRegions, maxMoves) as the max > cost, so it can scale moveCost to [0,1]. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14642) Disable flakey TestMultiParallel#testActiveThreadsCount
[ https://issues.apache.org/jira/browse/HBASE-14642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963001#comment-14963001 ] Hudson commented on HBASE-14642: SUCCESS: Integrated in HBase-1.3-IT #250 (See [https://builds.apache.org/job/HBase-1.3-IT/250/]) HBASE-14642 Disable flakey TestMultiParallel#testActiveThreadsCount (stack: rev 39f6a4eb0b27bf16d3c9ea98b3f5e8db8c594e78) * hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java > Disable flakey TestMultiParallel#testActiveThreadsCount > --- > > Key: HBASE-14642 > URL: https://issues.apache.org/jira/browse/HBASE-14642 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack > Attachments: 14642.txt > > > Failed twice in a row on 1.2 build... Disabling for now Unless someone > wants to dig in and fix it that is... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14643) Avoid Splits from once again opening a closed reader for fetching the first and last key
[ https://issues.apache.org/jira/browse/HBASE-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-14643: --- Summary: Avoid Splits from once again opening a closed reader for fetching the first and last key (was: Avoid Splits to create a closed reader for fetching the first and last key) > Avoid Splits from once again opening a closed reader for fetching the first > and last key > > > Key: HBASE-14643 > URL: https://issues.apache.org/jira/browse/HBASE-14643 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan > > Currently split flow is such that we close the parent region and all its > store file readers are also closed. After that inorder to split the > reference files we need the first and last keys for which once again open the > readers on those store files. This could be costlier operation considering > the fact that it has to contact the HDFS for this close and open operation. > This JIRA is to see if we can improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14629) Create replication zk node in postLogRoll instead of preLogRoll
[ https://issues.apache.org/jira/browse/HBASE-14629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963048#comment-14963048 ] He Liangliang commented on HBASE-14629: --- Just can't figure out the reason. From our case, ignoring FNFE is dangerous and it's disabled in production cluster. So avoiding FNFE is a helpful improvement. + [~jdcryans][~stack] any hints? > Create replication zk node in postLogRoll instead of preLogRoll > --- > > Key: HBASE-14629 > URL: https://issues.apache.org/jira/browse/HBASE-14629 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: He Liangliang > > Currently the hlog zk node is added before creating the log file, so it's > possible to raise FileNotFoundException if the server crash between these tow > operations. Move this step after file creating can avoid this exception (then > FNFE will be certainly treated as an error). If there is an crash after log > creation before creating the zk node, the log will be replayed and there > should be no data loss either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963077#comment-14963077 ] Hadoop QA commented on HBASE-13153: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767313/HBASE-13153-v11.patch against master branch at commit 8e6316a80cf96f4d4cd6bd10f4c647ebf45c7e02. ATTACHMENT ID: 12767313 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 41 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16091//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16091//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16091//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16091//console This message is automatically generated. > Bulk Loaded HFile Replication > - > > Key: HBASE-13153 > URL: https://issues.apache.org/jira/browse/HBASE-13153 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: sunhaitao >Assignee: Ashish Singhi > Fix For: 2.0.0 > > Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, > HBASE-13153-v11.patch, HBASE-13153-v2.patch, HBASE-13153-v3.patch, > HBASE-13153-v4.patch, HBASE-13153-v5.patch, HBASE-13153-v6.patch, > HBASE-13153-v7.patch, HBASE-13153-v8.patch, HBASE-13153-v9.patch, > HBASE-13153.patch, HBase Bulk Load Replication-v1-1.pdf, HBase Bulk Load > Replication-v2.pdf, HBase Bulk Load Replication.pdf > > > Currently we plan to use HBase Replication feature to deal with disaster > tolerance scenario.But we encounter an issue that we will use bulkload very > frequently,because bulkload bypass write path, and will not generate WAL, so > the data will not be replicated to backup cluster. It's inappropriate to > bukload twice both on active cluster and backup cluster. So i advise do some > modification to bulkload feature to enable bukload to both active cluster and > backup cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-14604 started by Guanghao Zhang. -- > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 0.98.15 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use Math.min(cluster.numRegions, maxMoves) as the max > cost, so it can scale moveCost to [0,1]. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-14604: --- Description: The code in MoveCoseFunction: {code} return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); {code} It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale moveCost to [0,1]. But this should use maxMoves as the max value when cluster have a lot of regions. Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost to [0, 0.25]. Improve moveCost by use Math.min(cluster.numRegions, maxMoves) as the max cost, so it can scale moveCost to [0,1]. {code} return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, moveCost); {code} was: The code in MoveCoseFunction: {code} return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); {code} It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale moveCost to [0,1]. But this should use maxMoves as the max value when cluster have a lot of regions. Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost to [0, 0.25]. Improve moveCost by use maxMoves. {code} return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, moveCost); {code} > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use Math.min(cluster.numRegions, maxMoves) as the max > cost, so it can scale moveCost to [0,1]. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14641) Move JDO example from Wiki to Ref Guide
[ https://issues.apache.org/jira/browse/HBASE-14641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963088#comment-14963088 ] Hadoop QA commented on HBASE-14641: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767309/HBASE-14641-v1.patch against master branch at commit 8e6316a80cf96f4d4cd6bd10f4c647ebf45c7e02. ATTACHMENT ID: 12767309 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16090//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16090//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16090//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16090//console This message is automatically generated. > Move JDO example from Wiki to Ref Guide > --- > > Key: HBASE-14641 > URL: https://issues.apache.org/jira/browse/HBASE-14641 > Project: HBase > Issue Type: Sub-task > Components: documentation >Affects Versions: 2.0.0 >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Fix For: 2.0.0 > > Attachments: HBASE-14641-v1.patch, HBASE-14641.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-13082: --- Attachment: HBASE-13082.pdf A simple one pager indicating what is the high level approach to the versioned store file approach. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14643) Avoid Splits to create a closed reader for fetching the first and last key
ramkrishna.s.vasudevan created HBASE-14643: -- Summary: Avoid Splits to create a closed reader for fetching the first and last key Key: HBASE-14643 URL: https://issues.apache.org/jira/browse/HBASE-14643 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: ramkrishna.s.vasudevan Currently split flow is such that we close the parent region and all its store file readers are also closed. After that inorder to split the reference files we need the first and last keys for which once again open the readers on those store files. This could be costlier operation considering the fact that it has to contact the HDFS for this close and open operation. This JIRA is to see if we can improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963297#comment-14963297 ] Anoop Sam John commented on HBASE-13082: So we wont reset the heap of the files during the scan. Compaction case is fine.. the versioned approach and we wont move the old compacted files until the scanners finished will do the solution. What about flush? Once the flush is finished, we will have to clear the snapshot of cells in Memstore no? Then we will have to open this newly added file for reading.. If we dont open it, we can not release the memstore snapshot. Means we will have to keep the cells as long as the scanner completes! That will not be acceptable IMO. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963357#comment-14963357 ] Ted Yu commented on HBASE-14631: Planning to integrate to all branches soon, if there is no more review comment. > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963400#comment-14963400 ] Anoop Sam John commented on HBASE-13082: Good doc explaining the change. bq. Along with the ref count, we also have CompactionStatus (COMPACTED and NON_COMPACTED) added to these store file readers This status is in reader? It suits better in StoreFile no? Even the ref count is in reader? I would say it is better suited in StoreFile. That call the state as Compacted we better call it as a file to be discarded? When we mark a file as compacted the confusion can be like whether this file is a file created out of compaction. (?) bq. Ensure that a background thread runs periodically runs that scans the list of store files and checks for the state as COMPACTED... This is a new Chore thread adding to the system.. Mention about it clearly. It will be one thread per RS. Also what is its interval? Is it configurable? How? bq.Hence it requires us to add new APIs which ensures that a split can go ahead even if reference files are present but they are in the COMPACTED state hmm.. making this more complex :-) > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963411#comment-14963411 ] Ted Yu commented on HBASE-14604: The classes are in the same package. Please make the new methods package private. Patch for 0.98 branch should contain 0.98 in the filename. Extension should be either .txt or .patch. Please attach patch for master branch. > Improve MoveCostFunction in StochasticLoadBalancer > -- > > Key: HBASE-14604 > URL: https://issues.apache.org/jira/browse/HBASE-14604 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 0.98.15 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-14604_98.diff, HBASE-14604_98_with_ut.diff > > > The code in MoveCoseFunction: > {code} > return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); > {code} > It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale > moveCost to [0,1]. But this should use maxMoves as the max value when cluster > have a lot of regions. > Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost > to [0, 0.25]. > Improve moveCost by use Math.min(cluster.numRegions, maxMoves) as the max > cost, so it can scale moveCost to [0,1]. > {code} > return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, > moveCost); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14643) Avoid Splits from once again opening a closed reader for fetching the first and last key
[ https://issues.apache.org/jira/browse/HBASE-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963442#comment-14963442 ] Heng Chen commented on HBASE-14643: --- IMO we can got the first and last keys before close store files and pass them as params into {{SplitTransactionImpl.splitStoreFiles}}, any concerns? > Avoid Splits from once again opening a closed reader for fetching the first > and last key > > > Key: HBASE-14643 > URL: https://issues.apache.org/jira/browse/HBASE-14643 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan > > Currently split flow is such that we close the parent region and all its > store file readers are also closed. After that inorder to split the > reference files we need the first and last keys for which once again open the > readers on those store files. This could be costlier operation considering > the fact that it has to contact the HDFS for this close and open operation. > This JIRA is to see if we can improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14624) BucketCache.freeBlock is too expensive
[ https://issues.apache.org/jira/browse/HBASE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963471#comment-14963471 ] Anoop Sam John commented on HBASE-14624: Not able to see from log that bucket cache free call takes time.. Missing some logs? > BucketCache.freeBlock is too expensive > -- > > Key: HBASE-14624 > URL: https://issues.apache.org/jira/browse/HBASE-14624 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 1.0.0 >Reporter: Randy Fox > > Moving regions is unacceptably slow when using bucket cache, as it takes too > long to free all the blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963386#comment-14963386 ] Anoop Sam John commented on HBASE-13153: Checking the op flow : Considering the scenario where peer cluster is not secure (no secure EP) and the bulk load to peer cluster needs a split, we will do split by reading each cell from remote src cluster HFile. This will be a costly op. Suggestion will be like when we have to do bulk load to peer cluster, make sure the big file is copied to dest peer cluster first and then do the split and read of the file. Had a call with Ashish and discussed this. He will come with change in the flow and doc > Bulk Loaded HFile Replication > - > > Key: HBASE-13153 > URL: https://issues.apache.org/jira/browse/HBASE-13153 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: sunhaitao >Assignee: Ashish Singhi > Fix For: 2.0.0 > > Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, > HBASE-13153-v11.patch, HBASE-13153-v2.patch, HBASE-13153-v3.patch, > HBASE-13153-v4.patch, HBASE-13153-v5.patch, HBASE-13153-v6.patch, > HBASE-13153-v7.patch, HBASE-13153-v8.patch, HBASE-13153-v9.patch, > HBASE-13153.patch, HBase Bulk Load Replication-v1-1.pdf, HBase Bulk Load > Replication-v2.pdf, HBase Bulk Load Replication.pdf > > > Currently we plan to use HBase Replication feature to deal with disaster > tolerance scenario.But we encounter an issue that we will use bulkload very > frequently,because bulkload bypass write path, and will not generate WAL, so > the data will not be replicated to backup cluster. It's inappropriate to > bukload twice both on active cluster and backup cluster. So i advise do some > modification to bulkload feature to enable bukload to both active cluster and > backup cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14624) BucketCache.freeBlock is too expensive
[ https://issues.apache.org/jira/browse/HBASE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963452#comment-14963452 ] Randy Fox commented on HBASE-14624: --- The region was about 1.4G compressed and it took about 2.5 minutes to move. The bucketcache is configed at 72G, but was not close to full. The bucket sizes: 9216,17408,33792,66560 2015-10-15 08:34:28,510 INFO org.apache.hadoop.hbase.regionserver.RSRpcServices: Close dad6c71ed395df19220ef1056a110086, moving to hb17.prod1.connexity.net,60020,125924021 2015-10-15 08:34:28,510 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086. 2015-10-15 08:34:28,511 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086.: disabling compactions & flushes 2015-10-15 08:34:28,511 INFO org.apache.hadoop.hbase.regionserver.HRegion: Running close preflush of Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086. 2015-10-15 08:34:28,511 INFO org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086., current region memstore size 44.89 MB, and 1/1 column families' memstores are being flushed. 2015-10-15 08:34:28,511 WARN org.apache.hadoop.hbase.regionserver.wal.FSHLog: Couldn't find oldest seqNum for the region we are about to flush: [dad6c71ed395df19220ef1056a110086] 2015-10-15 08:34:29,137 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=70839856845, memsize=44.9 M, hasBloomFilter=false, into tmp file hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/3600b47839a945de9733cf17581458e0 2015-10-15 08:34:29,144 DEBUG org.apache.hadoop.hbase.regionserver.HRegionFileSystem: Committing store file hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/3600b47839a945de9733cf17581458e0 as hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/3600b47839a945de9733cf17581458e0 2015-10-15 08:34:29,150 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/3600b47839a945de9733cf17581458e0, entries=190131, sequenceid=70839856845, filesize=3.2 M 2015-10-15 08:34:29,151 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~44.89 MB/47066832, currentsize=22.39 KB/22928 for region Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086. in 640ms, sequenceid=70839856845, compaction requested=false 2015-10-15 08:34:29,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086. 2015-10-15 08:34:29,152 INFO org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086., current region memstore size 22.39 KB, and 1/1 column families' memstores are being flushed. 2015-10-15 08:34:29,152 WARN org.apache.hadoop.hbase.regionserver.wal.FSHLog: Couldn't find oldest seqNum for the region we are about to flush: [dad6c71ed395df19220ef1056a110086] 2015-10-15 08:34:29,225 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=70839856876, memsize=22.4 K, hasBloomFilter=false, into tmp file hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/a8a74549c7a44174b105cbed23cfaea1 2015-10-15 08:34:29,231 DEBUG org.apache.hadoop.hbase.regionserver.HRegionFileSystem: Committing store file hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/.tmp/a8a74549c7a44174b105cbed23cfaea1 as hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/a8a74549c7a44174b105cbed23cfaea1 2015-10-15 08:34:29,279 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/L/a8a74549c7a44174b105cbed23cfaea1, entries=128, sequenceid=70839856876, filesize=2.6 K 2015-10-15 08:34:29,280 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~22.39 KB/22928, currentsize=0 B/0 for region Wildfire_graph3,\x00)"y\x1B\xF0\x13-,1434402364692.dad6c71ed395df19220ef1056a110086. in 128ms, sequenceid=70839856876, compaction requested=true ... 2015-10-15 08:37:02,945 INFO org.apache.hadoop.hbase.regionserver.HStore: Closed L 2015-10-15 08:37:02,954 DEBUG org.apache.hadoop.hbase.wal.WALSplitter: Wrote region seqId=hdfs://woz/hbase/data/default/Wildfire_graph3/dad6c71ed395df19220ef1056a110086/recovered.edits/70839856879.seqid to
[jira] [Commented] (HBASE-14624) BucketCache.freeBlock is too expensive
[ https://issues.apache.org/jira/browse/HBASE-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963477#comment-14963477 ] Randy Fox commented on HBASE-14624: --- There are no logs for that. Deduced it from the code and that fact that when i added a table to cache it also started taking along time to move when it was previously quick. Vladimir Rodionov suggested i open this Jira after discussions on the maillist. > BucketCache.freeBlock is too expensive > -- > > Key: HBASE-14624 > URL: https://issues.apache.org/jira/browse/HBASE-14624 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 1.0.0 >Reporter: Randy Fox > > Moving regions is unacceptably slow when using bucket cache, as it takes too > long to free all the blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14631: --- Hadoop Flags: Reviewed Fix Version/s: 0.98.16 1.1.3 1.0.3 1.3.0 1.2.0 2.0.0 > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963502#comment-14963502 ] ramkrishna.s.vasudevan edited comment on HBASE-13082 at 10/19/15 3:58 PM: -- bq.What about flush? Once the flush is finished, we will have to clear the snapshot of cells in Memstore no? Then we will have to open this newly added file for reading.. If we dont open it, we can not release the memstore snapshot. We do open the reader for sure. But one thing may be to check is that if we open but still the current scanner heap is not reset will it have any impact on the current scan is what needs to be checked? Because during flush the reads can still be served from the snapshot. Only after flush is a point to be noted. bq.This status is in reader? It suits better in StoreFile no? The reader is the common object here. Hence was referencing it from there. Initially had it in storefile only but felt that StoreFile is more volatile. Let me check. Can be done. Infact I was also not very sure of having the state in the reader. We can change the states no problem. bq.This is a new Chore thread adding to the system.. Mention about it clearly. It will be one thread per RS. Also what is its interval? Is it configurable? How? Forgot to add that config details. It is per store and the chore is started by the HStore and not the RS. It can be configured may be once in 2 mins or so? currently in patch it is once in 5 min. bq.Hence it requires us to add new APIs which ensures that a split can go ahead even if reference files are present but they are in the COMPACTED state This is bit of sticky area. In case of merge I have tried to forcefully clear the compacted files. May be we need to do the same with split also. But in split do we explicitly call compact? I was not pretty sure on that. But in merge we do. was (Author: ram_krish): bq.What about flush? Once the flush is finished, we will have to clear the snapshot of cells in Memstore no? Then we will have to open this newly added file for reading.. If we dont open it, we can not release the memstore snapshot. We do open the reader for sure. But one thing may be to check is that if we open but still the current scanner heap is not reset will it have any impact on the current scan is what needs to be checked? Because during flush the reads can still be served from the snapshot. Only after flush is a point to be noted. bq.This status is in reader? It suits better in StoreFile no? The reader is the common object here. Hence was referencing it from there. Initially had it in storefile only but felt that StoreFile is more volatile. Let me check. Can be done. Infact I was also not very sure of having the state in the reader. We can change the states no problem. bq.This is a new Chore thread adding to the system.. Mention about it clearly. It will be one thread per RS. Also what is its interval? Is it configurable? How? For to add that config details. It is per store and the chore is started by the HStore and not the RS. bq.Hence it requires us to add new APIs which ensures that a split can go ahead even if reference files are present but they are in the COMPACTED state This is bit of sticky area. In case of merge I have tried to forcefully clear the compacted files. May be we need to do the same with split also. But in split do we explicitly call compact? I was not pretty sure on that. But in merge we do. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the >
[jira] [Commented] (HBASE-14643) Avoid Splits from once again opening a closed reader for fetching the first and last key
[ https://issues.apache.org/jira/browse/HBASE-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963543#comment-14963543 ] Heng Chen commented on HBASE-14643: --- Yeah, we can do this in HRegion.close(), but i found that there are a lot of usages of {{HRegion.Close}} in project. It will change many places. If we do it outer HRegion.close, it will change little, i made a simple patch, could i upload it ? > Avoid Splits from once again opening a closed reader for fetching the first > and last key > > > Key: HBASE-14643 > URL: https://issues.apache.org/jira/browse/HBASE-14643 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > Currently split flow is such that we close the parent region and all its > store file readers are also closed. After that inorder to split the > reference files we need the first and last keys for which once again open the > readers on those store files. This could be costlier operation considering > the fact that it has to contact the HDFS for this close and open operation. > This JIRA is to see if we can improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14643) Avoid Splits from once again opening a closed reader for fetching the first and last key
[ https://issues.apache.org/jira/browse/HBASE-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-14643: -- Assignee: ramkrishna.s.vasudevan > Avoid Splits from once again opening a closed reader for fetching the first > and last key > > > Key: HBASE-14643 > URL: https://issues.apache.org/jira/browse/HBASE-14643 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > Currently split flow is such that we close the parent region and all its > store file readers are also closed. After that inorder to split the > reference files we need the first and last keys for which once again open the > readers on those store files. This could be costlier operation considering > the fact that it has to contact the HDFS for this close and open operation. > This JIRA is to see if we can improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14643) Avoid Splits from once again opening a closed reader for fetching the first and last key
[ https://issues.apache.org/jira/browse/HBASE-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963517#comment-14963517 ] ramkrishna.s.vasudevan commented on HBASE-14643: Exactly that was the plan. I noticed this while working on something else. I wanted to change the return type from the HRegion.close() itself so that the same can be used in the split flow. > Avoid Splits from once again opening a closed reader for fetching the first > and last key > > > Key: HBASE-14643 > URL: https://issues.apache.org/jira/browse/HBASE-14643 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan > > Currently split flow is such that we close the parent region and all its > store file readers are also closed. After that inorder to split the > reference files we need the first and last keys for which once again open the > readers on those store files. This could be costlier operation considering > the fact that it has to contact the HDFS for this close and open operation. > This JIRA is to see if we can improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14631: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963502#comment-14963502 ] ramkrishna.s.vasudevan commented on HBASE-13082: bq.What about flush? Once the flush is finished, we will have to clear the snapshot of cells in Memstore no? Then we will have to open this newly added file for reading.. If we dont open it, we can not release the memstore snapshot. We do open the reader for sure. But one thing may be to check is that if we open but still the current scanner heap is not reset will it have any impact on the current scan is what needs to be checked? Because during flush the reads can still be served from the snapshot. Only after flush is a point to be noted. bq.This status is in reader? It suits better in StoreFile no? The reader is the common object here. Hence was referencing it from there. Initially had it in storefile only but felt that StoreFile is more volatile. Let me check. Can be done. Infact I was also not very sure of having the state in the reader. We can change the states no problem. bq.This is a new Chore thread adding to the system.. Mention about it clearly. It will be one thread per RS. Also what is its interval? Is it configurable? How? For to add that config details. It is per store and the chore is started by the HStore and not the RS. bq.Hence it requires us to add new APIs which ensures that a split can go ahead even if reference files are present but they are in the COMPACTED state This is bit of sticky area. In case of merge I have tried to forcefully clear the compacted files. May be we need to do the same with split also. But in split do we explicitly call compact? I was not pretty sure on that. But in merge we do. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14636) Clear HFileScannerImpl#prevBlocks in between Compaction flow
[ https://issues.apache.org/jira/browse/HBASE-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963544#comment-14963544 ] stack commented on HBASE-14636: --- Want me to try something? > Clear HFileScannerImpl#prevBlocks in between Compaction flow > > > Key: HBASE-14636 > URL: https://issues.apache.org/jira/browse/HBASE-14636 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Blocker > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14645) ServerManager's lock is a serious point of contention
Elliott Clark created HBASE-14645: - Summary: ServerManager's lock is a serious point of contention Key: HBASE-14645 URL: https://issues.apache.org/jira/browse/HBASE-14645 Project: HBase Issue Type: Bug Reporter: Elliott Clark On cluster instability the server manager lock is where all threads go to hang out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-14646. --- Resolution: Fixed Fix Version/s: 1.3.0 1.2.0 2.0.0 > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14636) Clear HFileScannerImpl#prevBlocks in between Compaction flow
[ https://issues.apache.org/jira/browse/HBASE-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964016#comment-14964016 ] Hadoop QA commented on HBASE-14636: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767386/HBASE-14636.patch against master branch at commit ea0cf399b4b665d6a0daa0c4e616e893e377a283. ATTACHMENT ID: 12767386 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16096//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16096//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16096//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16096//console This message is automatically generated. > Clear HFileScannerImpl#prevBlocks in between Compaction flow > > > Key: HBASE-14636 > URL: https://issues.apache.org/jira/browse/HBASE-14636 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14636.patch, HBASE-14636.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14541) TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed due to too many splits and few retries
[ https://issues.apache.org/jira/browse/HBASE-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964075#comment-14964075 ] Hudson commented on HBASE-14541: FAILURE: Integrated in HBase-TRUNK #6928 (See [https://builds.apache.org/job/HBase-TRUNK/6928/]) HBASE-14541 TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed (matteo.bertozzi: rev fb583dd1ea5850f8d826bd39f9f8a61f5053e8e3) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java > TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed due to too many > splits and few retries > -- > > Key: HBASE-14541 > URL: https://issues.apache.org/jira/browse/HBASE-14541 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Matteo Bertozzi > Attachments: HBASE-14541-test.patch, HBASE-14541-v0.patch, > HBASE-14541-v0.patch > > > This one seems worth a dig. We seem to be making progress but here is what we > are trying to load which seems weird: > {code} > 2015-10-01 17:19:41,322 INFO [main] mapreduce.LoadIncrementalHFiles(360): > Split occured while grouping HFiles, retry attempt 10 with 4 files remaining > to group or split > 2015-10-01 17:19:41,323 ERROR [main] mapreduce.LoadIncrementalHFiles(402): > - > Bulk load aborted with some files not yet loaded: > - > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.bottom > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.top > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.bottom > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.top > {code} > Whats that about? > Making note here. Will keep an eye on this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964057#comment-14964057 ] Hudson commented on HBASE-14631: FAILURE: Integrated in HBase-0.98 #1161 (See [https://builds.apache.org/job/HBase-0.98/1161/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev aaf5653767c5b73ad15eb65cb644e1b985ac21d7) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14642) Disable flakey TestMultiParallel#testActiveThreadsCount
[ https://issues.apache.org/jira/browse/HBASE-14642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963952#comment-14963952 ] stack commented on HBASE-14642: --- It can also fail in a different manner. See https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/264/jdk=latest1.7,label=Hadoop/consoleText > Disable flakey TestMultiParallel#testActiveThreadsCount > --- > > Key: HBASE-14642 > URL: https://issues.apache.org/jira/browse/HBASE-14642 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack > Attachments: 14642.txt > > > Failed twice in a row on 1.2 build... Disabling for now Unless someone > wants to dig in and fix it that is... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964005#comment-14964005 ] stack commented on HBASE-13082: --- The one pager is great ([~lhofhansl]!). Nice summary of the issue up top. Explanation of the approach your fix takes helps. I am trying to understand the COMPACTED vs NON_COMPACTED state. Is NON_COMPACTED a freshly-flushed file? Is a COMPACTED file a file that is made up of N other storefiles and you want to make sure the scan doesn't include duplicated info -- the compacted file and the compactions inputs? Or I think you are saying that when a file is COMPACTED, then its content can be found in another file and so it should not be included in a scan? Should the state be COMPACTED_AWAY or REPLACED (by file...). Do we need the NON_COMPACTED state? It is the default. No need to call this state anything (Active?) How can the following happen? "Now in case of the versioned storefile approach change, there is a chance that there are reference files which are not yet archived after compaction because of the change in the store file management." Where will you write the COMPACTED_AWAY state? In memory or into the file? (I suppose you can't write it to the file because it is already written) Doc is great [~ram_krish] > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14636) Clear HFileScannerImpl#prevBlocks in between Compaction flow
[ https://issues.apache.org/jira/browse/HBASE-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964056#comment-14964056 ] stack commented on HBASE-14636: --- What about closeCheckInterval? We are already on an interval checking for a close. Could we do the shipped in here on this same interval? Or at least unite the two time-based checks? > Clear HFileScannerImpl#prevBlocks in between Compaction flow > > > Key: HBASE-14636 > URL: https://issues.apache.org/jira/browse/HBASE-14636 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14636.patch, HBASE-14636.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964085#comment-14964085 ] Hudson commented on HBASE-14647: FAILURE: Integrated in HBase-1.3 #281 (See [https://builds.apache.org/job/HBase-1.3/281/]) HBASE-14647 Disable (stack: rev 8940be05978950d18916d7d67d65c1fa915bd802) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964086#comment-14964086 ] Hudson commented on HBASE-14646: SUCCESS: Integrated in HBase-1.3-IT #252 (See [https://builds.apache.org/job/HBase-1.3-IT/252/]) HBASE-14646 Move TestCellACLs from medium to large category (stack: rev b921ed422215dbdb3c90a2f3bbda08c5aecee9ab) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java HBASE-14646 Move TestCellACLs from medium to large category; ADDENDUM (stack: rev 74ad3947e998301052503cbb4f7b9aade8ef42ef) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14636) Clear HFileScannerImpl#prevBlocks in between Compaction flow
[ https://issues.apache.org/jira/browse/HBASE-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964042#comment-14964042 ] stack edited comment on HBASE-14636 at 10/19/15 9:06 PM: - The patch seems to be doing well in my testbed (it takes a good few hours to run). Will keep you posted. We got much further than w/o the patch. On the patch: this.curBlock.getMemoryType() == MemoryType.SHARED Can we keep this internal to the block rather than have HFileScannerImpl have to know about SHARED?... Maybe a method on block... Sort of similiar, this bit where we do a check every two seconds.. could we change it to be size based? Maybe not every two seconds seems a little arbitrary. Pity there not a more 'natural' place to do the housekeeping. I had other comments here but edited it out because I remembered how this stuff worked after I'd made the comment... my comment made no sense after I remembered what shipped does. Please excuse. was (Author: stack): The patch seems to be doing well in my testbed (it takes a good few hours to run). Will keep you posted. We got much further than w/o the patch. On the patch: this.curBlock.getMemoryType() == MemoryType.SHARED Can we keep this internal to the block rather than have HFileScannerImpl have to know about SHARED?... Maybe a method on block... Sort of similiar, this bit where we do a check every two seconds.. could we change it to be size based? private static final long COMPACTION_PROGRESS_SHIPPED_CALL_INTERVAL = 2 * 1000; Could it be a method in KeyValueScanner that checks if we need to ship? Could call kvs.shipped(); everytime through and it figures when to ship? Or add a 'ship' method and if it returns true, then called shipped (Is that right? The method is 'shipped'. Have the KVs been shipped at this stage or does this method ship them? If it ships them, the method should be 'ship' rather than 'shipped'?) > Clear HFileScannerImpl#prevBlocks in between Compaction flow > > > Key: HBASE-14636 > URL: https://issues.apache.org/jira/browse/HBASE-14636 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14636.patch, HBASE-14636.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14420) Zombie Stomping Session
[ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14420: -- Attachment: none_fix.txt > Zombie Stomping Session > --- > > Key: HBASE-14420 > URL: https://issues.apache.org/jira/browse/HBASE-14420 > Project: HBase > Issue Type: Umbrella > Components: test >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: hangers.txt, none_fix (1).txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt > > > Patch build are now failing most of the time because we are dropping zombies. > I confirm we are doing this on non-apache build boxes too. > Left-over zombies consume resources on build boxes (OOME cannot create native > threads). Having to do multiple test runs in the hope that we can get a > non-zombie-making build or making (arbitrary) rulings that the zombies are > 'not related' is a productivity sink. And so on... > This is an umbrella issue for a zombie stomping session that started earlier > this week. Will hang sub-issues of this one. Am running builds back-to-back > on little cluster to turn out the monsters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14636) Clear HFileScannerImpl#prevBlocks in between Compaction flow
[ https://issues.apache.org/jira/browse/HBASE-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964042#comment-14964042 ] stack commented on HBASE-14636: --- The patch seems to be doing well in my testbed (it takes a good few hours to run). Will keep you posted. We got much further than w/o the patch. On the patch: this.curBlock.getMemoryType() == MemoryType.SHARED Can we keep this internal to the block rather than have HFileScannerImpl have to know about SHARED?... Maybe a method on block... Sort of similiar, this bit where we do a check every two seconds.. could we change it to be size based? private static final long COMPACTION_PROGRESS_SHIPPED_CALL_INTERVAL = 2 * 1000; Could it be a method in KeyValueScanner that checks if we need to ship? Could call kvs.shipped(); everytime through and it figures when to ship? Or add a 'ship' method and if it returns true, then called shipped (Is that right? The method is 'shipped'. Have the KVs been shipped at this stage or does this method ship them? If it ships them, the method should be 'ship' rather than 'shipped'?) > Clear HFileScannerImpl#prevBlocks in between Compaction flow > > > Key: HBASE-14636 > URL: https://issues.apache.org/jira/browse/HBASE-14636 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14636.patch, HBASE-14636.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964063#comment-14964063 ] Hudson commented on HBASE-14646: FAILURE: Integrated in HBase-1.2 #275 (See [https://builds.apache.org/job/HBase-1.2/275/]) HBASE-14646 Move TestCellACLs from medium to large category; ADDENDUM (stack: rev 443de6ef405887a052b66fbbd56282caf931f5a5) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964074#comment-14964074 ] Hudson commented on HBASE-14646: FAILURE: Integrated in HBase-TRUNK #6928 (See [https://builds.apache.org/job/HBase-TRUNK/6928/]) HBASE-14646 Move TestCellACLs from medium to large category (stack: rev ea0cf399b4b665d6a0daa0c4e616e893e377a283) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964076#comment-14964076 ] Hudson commented on HBASE-14647: FAILURE: Integrated in HBase-TRUNK #6928 (See [https://builds.apache.org/job/HBase-TRUNK/6928/]) HBASE-14647 Disable (stack: rev c1f0442045f44fcbb3935f9244794929a5d0caea) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964083#comment-14964083 ] Hudson commented on HBASE-14646: FAILURE: Integrated in HBase-1.3 #281 (See [https://builds.apache.org/job/HBase-1.3/281/]) HBASE-14646 Move TestCellACLs from medium to large category (stack: rev b921ed422215dbdb3c90a2f3bbda08c5aecee9ab) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java HBASE-14646 Move TestCellACLs from medium to large category; ADDENDUM (stack: rev 74ad3947e998301052503cbb4f7b9aade8ef42ef) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964087#comment-14964087 ] Hudson commented on HBASE-14647: SUCCESS: Integrated in HBase-1.3-IT #252 (See [https://builds.apache.org/job/HBase-1.3-IT/252/]) HBASE-14647 Disable (stack: rev 8940be05978950d18916d7d67d65c1fa915bd802) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14541) TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed due to too many splits and few retries
[ https://issues.apache.org/jira/browse/HBASE-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964084#comment-14964084 ] Hudson commented on HBASE-14541: FAILURE: Integrated in HBase-1.3 #281 (See [https://builds.apache.org/job/HBase-1.3/281/]) HBASE-14541 TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed (matteo.bertozzi: rev d01063c9a33608fd93a3043a3c3a96e83959cdfb) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java > TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed due to too many > splits and few retries > -- > > Key: HBASE-14541 > URL: https://issues.apache.org/jira/browse/HBASE-14541 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Matteo Bertozzi > Attachments: HBASE-14541-test.patch, HBASE-14541-v0.patch, > HBASE-14541-v0.patch > > > This one seems worth a dig. We seem to be making progress but here is what we > are trying to load which seems weird: > {code} > 2015-10-01 17:19:41,322 INFO [main] mapreduce.LoadIncrementalHFiles(360): > Split occured while grouping HFiles, retry attempt 10 with 4 files remaining > to group or split > 2015-10-01 17:19:41,323 ERROR [main] mapreduce.LoadIncrementalHFiles(402): > - > Bulk load aborted with some files not yet loaded: > - > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.bottom > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.top > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.bottom > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.top > {code} > Whats that about? > Making note here. Will keep an eye on this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964031#comment-14964031 ] stack commented on HBASE-13082: --- bq. When we mark a file as compacted the confusion can be like whether this file is a file created out of compaction. Yeah, that was my first thought too... same as Anoop. Is that a new Chore per Store per Region? Every two minutes seems like a long time to hold on to files? Yeah, this is a good point by [~anoop.hbase]: bq. What about flush? .So we wont change the store file's heap during the scan is not really possible? The other thing to consider is getting bulk loaded files in here while scans are going on. Seems like they could go in on open of a new scan. That'll work nicely. A file will be bullk loaded but won't be seen by ongoing scans. The hard thing then is what to do about flush (we've been here before!) On flush, what if we let all Scans complete before letting go the snapshot? More memory pressure. Simpler implementation. Otherwise, flush registers it has happened at the region level. Scanners check for flush events every-so-often (we already have checkpoints in the scan to ensure we don't go over size or time constraints... could check at these times) and when they find one, they swap in the flushed file. When all Scanners have done this, then we let go the snapshot. This might be a different sort of event to the one described in the doc here.. where we swap in compacted files on new scan creation.. but yeah, implementation would be cleaner if swap in of flushed files and compacted files all happened in the one manner. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14420) Zombie Stomping Session
[ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964049#comment-14964049 ] Hadoop QA commented on HBASE-14420: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767392/none_fix.txt against master branch at commit c1f0442045f44fcbb3935f9244794929a5d0caea. ATTACHMENT ID: 12767392 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16097//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16097//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16097//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16097//console This message is automatically generated. > Zombie Stomping Session > --- > > Key: HBASE-14420 > URL: https://issues.apache.org/jira/browse/HBASE-14420 > Project: HBase > Issue Type: Umbrella > Components: test >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: hangers.txt, none_fix (1).txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, > none_fix.txt > > > Patch build are now failing most of the time because we are dropping zombies. > I confirm we are doing this on non-apache build boxes too. > Left-over zombies consume resources on build boxes (OOME cannot create native > threads). Having to do multiple test runs in the hope that we can get a > non-zombie-making build or making (arbitrary) rulings that the zombies are > 'not related' is a productivity sink. And so on... > This is an umbrella issue for a zombie stomping session that started earlier > this week. Will hang sub-issues of this one. Am running builds back-to-back > on little cluster to turn out the monsters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964101#comment-14964101 ] Hudson commented on HBASE-14646: FAILURE: Integrated in HBase-1.2-IT #224 (See [https://builds.apache.org/job/HBase-1.2-IT/224/]) HBASE-14646 Move TestCellACLs from medium to large category (stack: rev 78e6da0c7caf80d0d9af66f75aad98684b9f176f) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java HBASE-14646 Move TestCellACLs from medium to large category; ADDENDUM (stack: rev 443de6ef405887a052b66fbbd56282caf931f5a5) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964102#comment-14964102 ] Hudson commented on HBASE-14647: FAILURE: Integrated in HBase-1.2-IT #224 (See [https://builds.apache.org/job/HBase-1.2-IT/224/]) HBASE-14647 Disable (stack: rev 102637f3beafe7faa8badb730762a3642a53940a) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14647: -- Fix Version/s: (was: 1.1.3) > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14362: -- Fix Version/s: (was: 1.1.3) > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963684#comment-14963684 ] Hudson commented on HBASE-14631: SUCCESS: Integrated in HBase-1.3-IT #251 (See [https://builds.apache.org/job/HBase-1.3-IT/251/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev 61b11e0765206b0acf2099958a46604216287ea7) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-14647. --- Resolution: Fixed Pushed to branch-1.2+ Resolving > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14646: -- Attachment: 14646.txt Pushed to branch-1.2+ > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963860#comment-14963860 ] ramkrishna.s.vasudevan commented on HBASE-13082: But again if handling flushes is a problem then resetting the heap should be done at any case. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14642) Disable flakey TestMultiParallel#testActiveThreadsCount
[ https://issues.apache.org/jira/browse/HBASE-14642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963946#comment-14963946 ] stack commented on HBASE-14642: --- [~chenheng] Thanks. Here is an example fail: https://builds.apache.org/job/HBase-1.3/jdk=latest1.8,label=Hadoop/277/console Says {code} Failed tests: TestMultiParallel.testActiveThreadsCount:160 expected:<5> but was:<4> {code} Above it says in branch-1.2 but it should have been branch-1: i.e. 1.3.I googled and found that this test has been failing for years... on and off. Thanks. > Disable flakey TestMultiParallel#testActiveThreadsCount > --- > > Key: HBASE-14642 > URL: https://issues.apache.org/jira/browse/HBASE-14642 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack > Attachments: 14642.txt > > > Failed twice in a row on 1.2 build... Disabling for now Unless someone > wants to dig in and fix it that is... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963669#comment-14963669 ] Hudson commented on HBASE-14631: SUCCESS: Integrated in HBase-1.2-IT #223 (See [https://builds.apache.org/job/HBase-1.2-IT/223/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev f62bbc9b669bdbae4b8267b1728a7743db847f8a) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13318) RpcServer.Listener.getAddress should be synchronized
[ https://issues.apache.org/jira/browse/HBASE-13318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963780#comment-14963780 ] Nitin Aggarwal commented on HBASE-13318: We are seeing this issue as well, any updates for it? > RpcServer.Listener.getAddress should be synchronized > > > Key: HBASE-13318 > URL: https://issues.apache.org/jira/browse/HBASE-13318 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.10.1 >Reporter: Lars Hofhansl >Priority: Minor > Labels: thread-safety > > We just saw exceptions like these: > {noformat} > Exception in thread "B.DefaultRpcServer.handler=45,queue=0,port=60020" > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.RpcServer$Listener.getAddress(RpcServer.java:753) > at > org.apache.hadoop.hbase.ipc.RpcServer.getListenerAddress(RpcServer.java:2157) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:146) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looks like RpcServer$Listener.getAddress should be synchronized > (acceptChannel is set to null upon exiting the thread under in a synchronized > block). > Should be happening very rarely only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963832#comment-14963832 ] Hudson commented on HBASE-14631: SUCCESS: Integrated in HBase-TRUNK #6927 (See [https://builds.apache.org/job/HBase-TRUNK/6927/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev 57fea77074ea319024b00ea1fab9cc6f1068d696) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963831#comment-14963831 ] Hudson commented on HBASE-14631: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1114 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1114/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev aaf5653767c5b73ad15eb65cb644e1b985ac21d7) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14648) Reenable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
stack created HBASE-14648: - Summary: Reenable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication Key: HBASE-14648 URL: https://issues.apache.org/jira/browse/HBASE-14648 Project: HBase Issue Type: Sub-task Components: test Reporter: stack -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14647: -- Attachment: 14647.txt Disabled for now. Needs work. Pushed to branch-1.2+ The push added more logging to try and help with this case: {code} java.lang.RuntimeException: sync aborted at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:492) at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.insert(WALProcedureStore.java:335) at org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS.testWalRollOnLowReplication(TestWALProcedureStoreOnHDFS.java:201) {code} ... looks like i < 50 inserts but I didn't have a log to say how many. The exception is: {code} 2015-10-19 04:32:36,051 WARN [Thread-416] hdfs.DFSOutputStream$DataStreamer(558): DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test-logs/state-0015.log could only be replicated to 2 nodes instead of minReplication (=3). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. ... {code} So, somehow we are marking all our replicas as bad. Upping the datanodes to 6 or 10 would make this test more likely to pass? For study. Let me file issue to reenable. > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963841#comment-14963841 ] Hudson commented on HBASE-14647: FAILURE: Integrated in HBase-1.2 #274 (See [https://builds.apache.org/job/HBase-1.2/274/]) HBASE-14647 Disable (stack: rev 102637f3beafe7faa8badb730762a3642a53940a) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > --- > > Key: HBASE-14647 > URL: https://issues.apache.org/jira/browse/HBASE-14647 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14647.txt > > > It failed on two trunk builds. Even after attempts at making the test looser, > we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963876#comment-14963876 ] Hudson commented on HBASE-14631: FAILURE: Integrated in HBase-1.3 #280 (See [https://builds.apache.org/job/HBase-1.3/280/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev 61b11e0765206b0acf2099958a46604216287ea7) * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14646) Move TestCellACLs from medium to large category
stack created HBASE-14646: - Summary: Move TestCellACLs from medium to large category Key: HBASE-14646 URL: https://issues.apache.org/jira/browse/HBASE-14646 Project: HBase Issue Type: Sub-task Components: test Reporter: stack Assignee: stack Priority: Minor Move this test to the large category because on my local rig, I got this on a run: {code} org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (secondPartTestsExecution) on project hbase-server: There was a timeout or other error in the fork {code} looking at the test output, TestCellACLs seems to be the 'hanging' test.. the test that is not completing. Lets see if moving the test to different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14648) Reenable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963682#comment-14963682 ] stack commented on HBASE-14648: --- See HBASE-14647 for some observations on recent fails. This is a critical test. Needs to work. > Reenable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > > > Key: HBASE-14648 > URL: https://issues.apache.org/jira/browse/HBASE-14648 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963710#comment-14963710 ] Hudson commented on HBASE-14631: FAILURE: Integrated in HBase-1.0 #1091 (See [https://builds.apache.org/job/HBase-1.0/1091/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev 1b816a518572fbfa4928a792bd160cabf948fb4c) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14647) Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
stack created HBASE-14647: - Summary: Disable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication Key: HBASE-14647 URL: https://issues.apache.org/jira/browse/HBASE-14647 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack It failed on two trunk builds. Even after attempts at making the test looser, we still fail. Needs work. Disabling for now while trying to stabilize build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14646) Move TestCellACLs from medium to large category
[ https://issues.apache.org/jira/browse/HBASE-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963840#comment-14963840 ] Hudson commented on HBASE-14646: FAILURE: Integrated in HBase-1.2 #274 (See [https://builds.apache.org/job/HBase-1.2/274/]) HBASE-14646 Move TestCellACLs from medium to large category (stack: rev 78e6da0c7caf80d0d9af66f75aad98684b9f176f) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLs.java > Move TestCellACLs from medium to large category > --- > > Key: HBASE-14646 > URL: https://issues.apache.org/jira/browse/HBASE-14646 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14646.txt > > > Move this test to the large category because on my local rig, I got this on a > run: > {code} > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test > (secondPartTestsExecution) on project hbase-server: There was a timeout or > other error in the fork > {code} > looking at the test output, TestCellACLs seems to be the 'hanging' > test.. the test that is not completing. Lets see if moving the test to > different category helps with these fails because of timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963858#comment-14963858 ] ramkrishna.s.vasudevan commented on HBASE-13082: bq.So we wont change the store file's heap during the scan is not really possible? I think our comments were published at the same time and in between JIRA was down. Still I think we can do this for a pure bulk loaded case. > Coarsen StoreScanner locks to RegionScanner > --- > > Key: HBASE-13082 > URL: https://issues.apache.org/jira/browse/HBASE-13082 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: ramkrishna.s.vasudevan > Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, > 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1_WIP.patch, > HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, HBASE-13082_4.patch, gc.png, > gc.png, gc.png, hits.png, next.png, next.png > > > Continuing where HBASE-10015 left of. > We can avoid locking (and memory fencing) inside StoreScanner by deferring to > the lock already held by the RegionScanner. > In tests this shows quite a scan improvement and reduced CPU (the fences make > the cores wait for memory fetches). > There are some drawbacks too: > * All calls to RegionScanner need to be remain synchronized > * Implementors of coprocessors need to be diligent in following the locking > contract. For example Phoenix does not lock RegionScanner.nextRaw() and > required in the documentation (not picking on Phoenix, this one is my fault > as I told them it's OK) > * possible starving of flushes and compaction with heavy read load. > RegionScanner operations would keep getting the locks and the > flushes/compactions would not be able finalize the set of files. > I'll have a patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14648) Reenable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
[ https://issues.apache.org/jira/browse/HBASE-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14648: -- Priority: Critical (was: Major) > Reenable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication > > > Key: HBASE-14648 > URL: https://issues.apache.org/jira/browse/HBASE-14648 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14541) TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed due to too many splits and few retries
[ https://issues.apache.org/jira/browse/HBASE-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963685#comment-14963685 ] Hudson commented on HBASE-14541: SUCCESS: Integrated in HBase-1.3-IT #251 (See [https://builds.apache.org/job/HBase-1.3-IT/251/]) HBASE-14541 TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed (matteo.bertozzi: rev d01063c9a33608fd93a3043a3c3a96e83959cdfb) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java > TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed due to too many > splits and few retries > -- > > Key: HBASE-14541 > URL: https://issues.apache.org/jira/browse/HBASE-14541 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Matteo Bertozzi > Attachments: HBASE-14541-test.patch, HBASE-14541-v0.patch, > HBASE-14541-v0.patch > > > This one seems worth a dig. We seem to be making progress but here is what we > are trying to load which seems weird: > {code} > 2015-10-01 17:19:41,322 INFO [main] mapreduce.LoadIncrementalHFiles(360): > Split occured while grouping HFiles, retry attempt 10 with 4 files remaining > to group or split > 2015-10-01 17:19:41,323 ERROR [main] mapreduce.LoadIncrementalHFiles(402): > - > Bulk load aborted with some files not yet loaded: > - > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.bottom > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.top > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.bottom > > hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.top > {code} > Whats that about? > Making note here. Will keep an eye on this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14631) Region merge request should be audited with request user through proper scope of doAs() calls to region observer notifications
[ https://issues.apache.org/jira/browse/HBASE-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963702#comment-14963702 ] Hudson commented on HBASE-14631: FAILURE: Integrated in HBase-1.2 #273 (See [https://builds.apache.org/job/HBase-1.2/273/]) HBASE-14631 Region merge request should be audited with request user (tedyu: rev f62bbc9b669bdbae4b8267b1728a7743db847f8a) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionMergeTransactionImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.java > Region merge request should be audited with request user through proper scope > of doAs() calls to region observer notifications > -- > > Key: HBASE-14631 > URL: https://issues.apache.org/jira/browse/HBASE-14631 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16 > > Attachments: 14631-branch-0.98.txt, 14631-branch-1.0.txt, > 14631-branch-1.txt, 14631-v1.txt > > > HBASE-14475 and HBASE-14605 narrowed the scope of doAs() calls to region > observer notifications for region splitting. > During review of HBASE-14605, Andrew brought up the case for region merge. > This JIRA is to implement similar scope narrowing technique for region > merging. > The majority of the change would be in RegionMergeTransactionImpl class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14636) Clear HFileScannerImpl#prevBlocks in between Compaction flow
[ https://issues.apache.org/jira/browse/HBASE-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963854#comment-14963854 ] ramkrishna.s.vasudevan commented on HBASE-14636: With 9G heap space and LRU cache just after loading 20G of data and running workload c - causes pauses upto 1.5 secs even with the patch applied. So it is something else that is leading to this bigger GCs? > Clear HFileScannerImpl#prevBlocks in between Compaction flow > > > Key: HBASE-14636 > URL: https://issues.apache.org/jira/browse/HBASE-14636 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14636.patch, HBASE-14636.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14387) Compaction improvements: Maximum off-peak compaction size
[ https://issues.apache.org/jira/browse/HBASE-14387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14387: -- Summary: Compaction improvements: Maximum off-peak compaction size (was: Compaction improvements: Maximum compaction size) > Compaction improvements: Maximum off-peak compaction size > - > > Key: HBASE-14387 > URL: https://issues.apache.org/jira/browse/HBASE-14387 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > Attachments: HBASE-14387-v1.patch > > > Make max compaction size for peak and off peak separate config options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14387) Compaction improvements: Maximum compaction size
[ https://issues.apache.org/jira/browse/HBASE-14387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14387: -- Status: Patch Available (was: Open) > Compaction improvements: Maximum compaction size > > > Key: HBASE-14387 > URL: https://issues.apache.org/jira/browse/HBASE-14387 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > Attachments: HBASE-14387-v1.patch > > > Make max compaction size for peak and off peak separate config options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14387) Compaction improvements: Maximum compaction size
[ https://issues.apache.org/jira/browse/HBASE-14387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14387: -- Attachment: HBASE-14387-v1.patch Patch v1. Added new compaction configuration option: *hbase.hstore.compaction.max.size.offpeak* - maximum selection size eligible for minor compaction during off peak hours. *hbase.hstore.compaction.max.size* - this is default max if no off-peak hours are defined or if no maximum off-peak size is defined. > Compaction improvements: Maximum compaction size > > > Key: HBASE-14387 > URL: https://issues.apache.org/jira/browse/HBASE-14387 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > Attachments: HBASE-14387-v1.patch > > > Make max compaction size for peak and off peak separate config options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14636) Clear HFileScannerImpl#prevBlocks in between Compaction flow
[ https://issues.apache.org/jira/browse/HBASE-14636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964258#comment-14964258 ] stack commented on HBASE-14636: --- Patch seems to have made it so we survived the loading. Great. I'm trying workloadc again... and then will try again but w/ less memory. Will report back. > Clear HFileScannerImpl#prevBlocks in between Compaction flow > > > Key: HBASE-14636 > URL: https://issues.apache.org/jira/browse/HBASE-14636 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14636.patch, HBASE-14636.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-8939) Hanging unit tests
[ https://issues.apache.org/jira/browse/HBASE-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-8939. -- Resolution: Fixed Assignee: stack Resolving parent issue because all subtasks done and then this issue was subsumed by HBASE-14420 > Hanging unit tests > -- > > Key: HBASE-8939 > URL: https://issues.apache.org/jira/browse/HBASE-8939 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: stack > Attachments: 8939.txt > > > We have hanging tests. Here's a few from this morning's review: > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh > https://builds.apache.org/job/hbase-0.95-on-hadoop2/176/consoleText > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft Speed > 100 3300k0 3300k0 0 508k 0 --:--:-- 0:00:06 --:--:-- 621k > Hanging test: Running org.apache.hadoop.hbase.TestIOFencing > Hanging test: Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling > {code} > And... > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh > http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/396/consoleText > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft Speed > 100 779k0 779k0 0 538k 0 --:--:-- 0:00:01 --:--:-- 559k > Hanging test: Running org.apache.hadoop.hbase.TestIOFencing > Hanging test: Running > org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort > Hanging test: Running org.apache.hadoop.hbase.client.TestFromClientSide3 > {code} > and > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh > http://54.241.6.143/job/HBase-0.95/607/consoleText > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft Speed > 100 445k0 445k0 0 490k 0 --:--:-- --:--:-- --:--:-- 522k > Hanging test: Running > org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer > Hanging test: Running org.apache.hadoop.hbase.master.TestAssignmentManager > Hanging test: Running org.apache.hadoop.hbase.util.TestHBaseFsck > Hanging test: Running > org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary > Hanging test: Running > org.apache.hadoop.hbase.IntegrationTestDataIngestSlowDeterministic > {code} > and... > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh > http://54.241.6.143/job/HBase-0.95-Hadoop-2/607/consoleText > % Total% Received % Xferd Average Speed TimeTime Time > Current > Dload Upload Total SpentLeft Speed > 100 781k0 781k0 0 240k 0 --:--:-- 0:00:03 --:--:-- 244k > Hanging test: Running > org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint > Hanging test: Running org.apache.hadoop.hbase.client.TestFromClientSide > Hanging test: Running org.apache.hadoop.hbase.TestIOFencing > Hanging test: Running > org.apache.hadoop.hbase.master.TestMasterFailoverBalancerPersistence > Hanging test: Running > org.apache.hadoop.hbase.master.TestDistributedLogSplitting > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12558) Disable TestHCM.testClusterStatus Unexpected exception, expected but was
[ https://issues.apache.org/jira/browse/HBASE-12558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-12558. --- Resolution: Fixed Resolving. This test was disabled. Make a new issue to reeenable after flakyness is fixed. > Disable TestHCM.testClusterStatus Unexpected exception, > expected > but was > - > > Key: HBASE-12558 > URL: https://issues.apache.org/jira/browse/HBASE-12558 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: 12558-master.patch, 12558.ignore.txt > > > Happens for me reliably on mac os x. I looked at fixing it. The listener is > not noticing the publish for whatever reason. Thats where I stopped. > {code} > java.lang.Exception: Unexpected exception, > expected > but was > at junit.framework.Assert.fail(Assert.java:57) > at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193) > at > org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537) > at > org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14569) Disable hanging test TestNamespaceAuditor
[ https://issues.apache.org/jira/browse/HBASE-14569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-14569. --- Resolution: Fixed Assignee: stack (was: Vandana Ayyalasomayajula) HBASE-14650 is the issue to reenable this flakey test when fixed. > Disable hanging test TestNamespaceAuditor > - > > Key: HBASE-14569 > URL: https://issues.apache.org/jira/browse/HBASE-14569 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Attachments: 14569.txt, 14569.txt > > > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15893//console It hangs > quite regularly. Any chance of taking a look [~avandana]? Else, I'll just > disable it so we can get clean builds again. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)