[jira] [Reopened] (HBASE-23771) [Flakey Tests] Test TestSplitTransactionOnCluster Again
[ https://issues.apache.org/jira/browse/HBASE-23771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-23771: Reopen for backport to branch-2.2. > [Flakey Tests] Test TestSplitTransactionOnCluster Again > --- > > Key: HBASE-23771 > URL: https://issues.apache.org/jira/browse/HBASE-23771 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0 > > Attachments: > 0001-HBASE-23771-Flakey-Tests-Test-TestSplitTransactionOn.patch, Screen Shot > 2020-01-31 at 8.37.13 AM.png > > > Parent fix had the test failures in GCE go from 35% to 4%. Let me see if can > clear the remaining fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23771) [Flakey Tests] Test TestSplitTransactionOnCluster Again
[ https://issues.apache.org/jira/browse/HBASE-23771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23771. Fix Version/s: 2.2.6 Resolution: Fixed Pushed to branch-2.2. > [Flakey Tests] Test TestSplitTransactionOnCluster Again > --- > > Key: HBASE-23771 > URL: https://issues.apache.org/jira/browse/HBASE-23771 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6 > > Attachments: > 0001-HBASE-23771-Flakey-Tests-Test-TestSplitTransactionOn.patch, Screen Shot > 2020-01-31 at 8.37.13 AM.png > > > Parent fix had the test failures in GCE go from 35% to 4%. Let me see if can > clear the remaining fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24417) update copyright notices year to 2020
Guangxu Cheng created HBASE-24417: - Summary: update copyright notices year to 2020 Key: HBASE-24417 URL: https://issues.apache.org/jira/browse/HBASE-24417 Project: HBase Issue Type: Task Components: documentation Reporter: Guangxu Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] The first HBase 2.2.5 release candidate (RC0) is available
Please vote on this release candidate (RC) for Apache HBase 2.2.5. The VOTE will remain open for at least 72 hours. [ ] +1 Release this package as Apache HBase 2.2.5 [ ] -1 Do not release this package because ... The tag to be voted on is 2.2.5RC0. The release files, including signatures, digests, etc. can be found at: https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/ Maven artifacts are available in a staging repository at: https://repository.apache.org/content/repositories/orgapachehbase-1392 Signatures used for HBase RCs can be found in this file: https://dist.apache.org/repos/dist/release/hbase/KEYS The list of bug fixes going into 2.2.5 can be found in included CHANGES.md and RELEASENOTES.md available here: https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/CHANGES.md https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/RELEASENOTES.md A detailed source and binary compatibility report for this release is available at: https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/api_compare_2.2.5RC0_to_2.2.4.html NOTICE: There are some incompatibility changes for RemoteHTable and RemoteAdmin interface. They are test only and marked to private now. See HBASE-24115 for more details. To learn more about Apache HBase, please see http://hbase.apache.org/ Thanks, Guanghao Zhang
[jira] [Resolved] (HBASE-24413) HBASE-22259 Removed deprecated getTimeStampOfLastShippedOp method from ReplicationLoadSource, but there were still references to this method on ruby admin.rb
[ https://issues.apache.org/jira/browse/HBASE-24413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-24413. -- Resolution: Fixed > HBASE-22259 Removed deprecated getTimeStampOfLastShippedOp method from > ReplicationLoadSource, but there were still references to this method on ruby > admin.rb > - > > Key: HBASE-24413 > URL: https://issues.apache.org/jira/browse/HBASE-24413 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0-alpha-1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] The first HBase 2.2.5 release candidate (RC0) is available
The compatibility report has something wrong? It shows that a lot of methods have been removed in RemoteAdmin and RemoteHTable, but they are both declared as IA.Private. Please check the release script and generate the report again? Andrew Purtell 于2020年5月23日周六 上午4:51写道: > +1 (binding) > > * Signature: ok > * Checksum : ok > * Rat check (1.8.0_232): ok > - mvn clean apache-rat:check > * Built from source (1.8.0_232): ok > - mvn clean install -DskipTests > * Unit tests pass (1.8.0_232): failed, see below > - mvn package -P runAllTests > > Minor test issues, a handful of flakes. First error is likely an > interaction with a concurrent test. Second is a timeout, maybe also a cross > unit interaction. > > Errors: > > [ERROR] org.apache.hadoop.hbase.client.TestAsyncTableScanRenewLease.null > [ERROR] Run 1: TestAsyncTableScanRenewLease.setUp:65 » IO Shutting down > [ERROR] Run 2: TestAsyncTableScanRenewLease.tearDown:76 NullPointer > [ERROR] org.apache.hadoop.hbase.regionserver.TestRegionReplicas.null > [ERROR] Run 1: > TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles:481 » > TestTimedOut > [ERROR] Run 2: TestRegionReplicas » Appears to be stuck in thread > Default-IPC-NioEventLoopGr... > > Flakes: > > [WARNING] > org.apache.hadoop.hbase.master.assignment.TestRegionMoveAndAbandon.test > [ERROR] Run 1: TestRegionMoveAndAbandon.test:120 » Runtime > org.apache.hadoop.hbase.client.Ret... > [INFO] Run 2: PASS > > [WARNING] org.apache.hadoop.hbase.tool.TestCanaryTool.testReadTableTimeouts > [ERROR] Run 1: TestCanaryTool.testReadTableTimeouts:218 > [ERROR] Run 2: TestCanaryTool.testReadTableTimeouts:218 > [INFO] Run 3: PASS > > > On Fri, May 22, 2020 at 2:41 AM Guanghao Zhang wrote: > > > Please vote on this release candidate (RC) for Apache HBase 2.2.5. > > > > The VOTE will remain open for at least 72 hours. > > > > [ ] +1 Release this package as Apache HBase 2.2.5 > > [ ] -1 Do not release this package because ... > > > > The tag to be voted on is 2.2.5RC0. The release files, including > > signatures, digests, etc. can be found at: > > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/ > > > > Maven artifacts are available in a staging repository at: > > https://repository.apache.org/content/repositories/orgapachehbase-1392 > > > > Signatures used for HBase RCs can be found in this file: > > https://dist.apache.org/repos/dist/release/hbase/KEYS > > > > The list of bug fixes going into 2.2.5 can be found in included > > CHANGES.md and RELEASENOTES.md available here: > > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/CHANGES.md > > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/RELEASENOTES.md > > > > A detailed source and binary compatibility report for this release is > > available at: > > > > > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/api_compare_2.2.5RC0_to_2.2.4.html > > NOTICE: There are some incompatibility changes for RemoteHTable and > > RemoteAdmin interface. They are test only and marked to private now. > > See HBASE-24115 for more details. > > > > To learn more about Apache HBase, please see http://hbase.apache.org/ > > > > Thanks, > > Guanghao Zhang > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
Failure: HBase Generate Website
Build status: Failure The HBase website has not been updated to incorporate HBase commit ${CURRENT_HBASE_COMMIT}. See https://builds.apache.org/job/hbase_generate_website/2016/console
[jira] [Created] (HBASE-24420) BulkLoad May Fall Into Unbelievable Retry Attempt in Some case
wuchang created HBASE-24420: --- Summary: BulkLoad May Fall Into Unbelievable Retry Attempt in Some case Key: HBASE-24420 URL: https://issues.apache.org/jira/browse/HBASE-24420 Project: HBase Issue Type: Bug Reporter: wuchang In https://issues.apache.org/jira/browse/HBASE-14541, the retry logic changed from a configurable retry times(by configuration item hbase.bulkload.retries.number) to below retry logic to process the issue that the RegionSplit happened during bulk load: {code:java} int maxRetries = getConf().getInt("hbase.bulkload.retries.number", 10); maxRetries = Math.max(maxRetries, startEndKeys.getFirst().length + 1); if (maxRetries != 0 && count >= maxRetries) { throw new IOException("Retry attempted " + count + " times without completing, bailing out"); } {code} This issue caused another issue in our cluster, that is: Our table has 2000 regions and our bulk load failed for an configuration issue(unrelated with this case, so ignore the failure reason) and then ,the bulk load fall into a retry disaster and after retry reached about 200, our HDFS crashed for OOM. During with, the HBase table splits never happened; I think the patch in HBASE-14541 didn't handle the unrecoverable retry case and in this case(I think many reason may incur unrecoverable retry) the meaningless retry attempts becomes disaster and is un-configurable because we cannot change the Region number of our table; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24418) Consolidate Normalizer implementations
Nick Dimiduk created HBASE-24418: Summary: Consolidate Normalizer implementations Key: HBASE-24418 URL: https://issues.apache.org/jira/browse/HBASE-24418 Project: HBase Issue Type: Task Components: master Affects Versions: 3.0.0-alpha-1, 2.3.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk After HBASE-22285, we have two implementations of {{RegionNormalizer}}, that have different feature sets and different configurations. I think these can be combined into a single implementation, with clear, decoupled configuration parameters. At least on branch-2.3, there's too many subsequent changes for HBASE-22285 to revert cleanly, so I'll use this ticket to consolidate the implementations. If you have issues with the current normalizer, speak up here and we can include them, or add them as sub-tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24419) Normalizer merge plans should account more than 2 regions when possible
Nick Dimiduk created HBASE-24419: Summary: Normalizer merge plans should account more than 2 regions when possible Key: HBASE-24419 URL: https://issues.apache.org/jira/browse/HBASE-24419 Project: HBase Issue Type: Improvement Components: master Affects Versions: 3.0.0-alpha-1, 2.3.0 Reporter: Nick Dimiduk The merge plans produced by the normalizer operate over two regions. Our merge operation supports multiple regions in a single request. When there are multiple merge plans generated over contiguous region space, these should be collapsed into a single merge operation. This should automatically honor whatever existing configuration settings exist limiting the number of regions that can participate in a merge procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Unit Test Notes
After a bit of work, there are currently no flakies in branch-2.3 and all tests passed over the last ten nightlies (a nightly is a comprehensive build that runs the full test suite once for jdk8+hadoop2, again for jdk8+hadoop3, and again for jdk11+hadoop3). You can see this by looking at our flakies dashboard for branch-2.3 [1][2]. Branch-2 is not too far behind with one flakey and a recent nightly test failure [3]. This 'cleanliness' is a little noteworthy, IMO. Other branches have not had the same focus so their state varies w/ attention paid. Attempts were also recently made at speeding up the jenkins test builds playing w/ maven forkcount, shrinking test resource usage, and with the maven -T which allows manipulating levels of maven module build/test parallelism (HBASE-24150, HBASE-24072, etc.). There was little yield to be had here...perhaps a 20% improvement. Complications included: jenkins build slaves allow two executors/builds to run at the same time so when an hbase build runs, it is sharing the machine w/ another (often another hbase build); host and docker resource constraints; and that our module inter-dependency constrains how much parallelism is allowed. As part of the above work in branch-2/branch-2.3, tests were run locally on various hardware. It should come as no surprise that the experience varied w/ environment (less so as flakies were addressed). On better hardware, tests can be made run more furiously so they use all the machine and complete faster. The settings we have as our defaults are configured to suit the Apache Jenkins build environment which is usually 16CPUs/48G. As said above, Jenkins slaves allow two builds machines so halve these resources when an HBase build runs on Apache Infrastructure. So as to be considerate of our companion Apache projects, defaults are relatively 'mild': our forkcount is set to 0.25 all of the CPUs in the machine. On Apache Jenkins, 0.25*16CPU == 4 CPUs for hbase build. We also set -T2 which means up to two modules building in parallel where possible (each with above configured forkcount). Our test suites on Jenkins continue to take hours. On a 40CPU linux machine with the below arguments where we use half the CPUs in the machine (and ulimit -u 40960), all tests run in just under an hour: $ x="0.50C" ; nohup mvn -T2 -Dsurefire.firstPartForkCount=$x -Dsurefire.secondPartForkCount=$x test -PrunAllTests Upping the forkcount on this machine beyond 0.50C tended to bring a rush of tests exiting... (To be investigated). On this machine, tests currently pass about 80% of the time. To be improved. On an anemic 4CPU VM, I can run the below and it will pass 60% of the time. It takes ~5hours: $ x="1.0C" ; nohup mvn -T2 -Dsurefire.firstPartForkCount=$x -Dsurefire.secondPartForkCount=$x test -PrunAllTests On a mac w/ 12CPUs, I can run same command as above. It passes with about same frequency and takes just over 1 1/2 hours. On my laptop it is less reliable passing about 1/3rd of the time in about 2 1/2 hours. If I use less resources, a lesser forkcount, the tests complete more often (but take correspondingly longer). Going forward, we will continue to watch branch-2/branch-2.3. Regards speedup, there is a bunch to do. A large win is to be had improving the HDFS mini cluster adding configuration (lots of resources such as pool thread counts are hard coded and numbers that are large for small test run) and working on speeding startup times. oao, S 1. https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.3/lastSuccessfulBuild/artifact/dashboard.html 2. Unfortunately, the nightly list shows reds though all tests passed because of report assemblage issues being addressed by infra: https://issues.apache.org/jira/browse/INFRA-20025 3. https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html
Re: [VOTE] The first HBase 2.2.5 release candidate (RC0) is available
+1 (binding) * Signature: ok * Checksum : ok * Rat check (1.8.0_232): ok - mvn clean apache-rat:check * Built from source (1.8.0_232): ok - mvn clean install -DskipTests * Unit tests pass (1.8.0_232): failed, see below - mvn package -P runAllTests Minor test issues, a handful of flakes. First error is likely an interaction with a concurrent test. Second is a timeout, maybe also a cross unit interaction. Errors: [ERROR] org.apache.hadoop.hbase.client.TestAsyncTableScanRenewLease.null [ERROR] Run 1: TestAsyncTableScanRenewLease.setUp:65 » IO Shutting down [ERROR] Run 2: TestAsyncTableScanRenewLease.tearDown:76 NullPointer [ERROR] org.apache.hadoop.hbase.regionserver.TestRegionReplicas.null [ERROR] Run 1: TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles:481 » TestTimedOut [ERROR] Run 2: TestRegionReplicas » Appears to be stuck in thread Default-IPC-NioEventLoopGr... Flakes: [WARNING] org.apache.hadoop.hbase.master.assignment.TestRegionMoveAndAbandon.test [ERROR] Run 1: TestRegionMoveAndAbandon.test:120 » Runtime org.apache.hadoop.hbase.client.Ret... [INFO] Run 2: PASS [WARNING] org.apache.hadoop.hbase.tool.TestCanaryTool.testReadTableTimeouts [ERROR] Run 1: TestCanaryTool.testReadTableTimeouts:218 [ERROR] Run 2: TestCanaryTool.testReadTableTimeouts:218 [INFO] Run 3: PASS On Fri, May 22, 2020 at 2:41 AM Guanghao Zhang wrote: > Please vote on this release candidate (RC) for Apache HBase 2.2.5. > > The VOTE will remain open for at least 72 hours. > > [ ] +1 Release this package as Apache HBase 2.2.5 > [ ] -1 Do not release this package because ... > > The tag to be voted on is 2.2.5RC0. The release files, including > signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/ > > Maven artifacts are available in a staging repository at: > https://repository.apache.org/content/repositories/orgapachehbase-1392 > > Signatures used for HBase RCs can be found in this file: > https://dist.apache.org/repos/dist/release/hbase/KEYS > > The list of bug fixes going into 2.2.5 can be found in included > CHANGES.md and RELEASENOTES.md available here: > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/CHANGES.md > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/RELEASENOTES.md > > A detailed source and binary compatibility report for this release is > available at: > > https://dist.apache.org/repos/dist/dev/hbase/2.2.5RC0/api_compare_2.2.5RC0_to_2.2.4.html > NOTICE: There are some incompatibility changes for RemoteHTable and > RemoteAdmin interface. They are test only and marked to private now. > See HBASE-24115 for more details. > > To learn more about Apache HBase, please see http://hbase.apache.org/ > > Thanks, > Guanghao Zhang > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
[jira] [Resolved] (HBASE-24407) Correct the comment of clusterRegionLocationMocks in TestStochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-24407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HBASE-24407. --- Fix Version/s: 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Merged. Thanks for the patch [~filtertip] > Correct the comment of clusterRegionLocationMocks in > TestStochasticLoadBalancer > --- > > Key: HBASE-24407 > URL: https://issues.apache.org/jira/browse/HBASE-24407 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Zheng Wang >Assignee: Zheng Wang >Priority: Minor > Fix For: 3.0.0-alpha-1 > > > It's a little bit inaccurate in comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Unit Test Notes
Thank you for sending these detailed notes. I fear I will be duplicating these efforts on branch-1 when (eventually) preparing for 1.7.0. On Fri, May 22, 2020 at 1:32 PM Stack wrote: > After a bit of work, there are currently no flakies in branch-2.3 and all > tests passed over the last ten nightlies (a nightly is a comprehensive > build that runs the full test suite once for jdk8+hadoop2, again for > jdk8+hadoop3, and again for jdk11+hadoop3). You can see this by looking at > our flakies dashboard for branch-2.3 [1][2]. Branch-2 is not too far behind > with one flakey and a recent nightly test failure [3]. > > This 'cleanliness' is a little noteworthy, IMO. > > Other branches have not had the same focus so their state varies w/ > attention paid. > > Attempts were also recently made at speeding up the jenkins test builds > playing w/ maven forkcount, shrinking test resource usage, and with the > maven -T which allows manipulating levels of maven module build/test > parallelism (HBASE-24150, HBASE-24072, etc.). There was little yield to be > had here...perhaps a 20% improvement. Complications included: jenkins build > slaves allow two executors/builds to run at the same time so when an hbase > build runs, it is sharing the machine w/ another (often another hbase > build); host and docker resource constraints; and that our module > inter-dependency constrains how much parallelism is allowed. > > As part of the above work in branch-2/branch-2.3, tests were run locally on > various hardware. It should come as no surprise that the experience varied > w/ environment (less so as flakies were addressed). On better hardware, > tests can be made run more furiously so they use all the machine and > complete faster. > > The settings we have as our defaults are configured to suit the Apache > Jenkins build environment which is usually 16CPUs/48G. As said above, > Jenkins slaves allow two builds machines so halve these resources when an > HBase build runs on Apache Infrastructure. So as to be considerate of our > companion Apache projects, defaults are relatively 'mild': our forkcount is > set to 0.25 all of the CPUs in the machine. On Apache Jenkins, 0.25*16CPU > == 4 CPUs for hbase build. We also set -T2 which means up to two modules > building in parallel where possible (each with above configured forkcount). > Our test suites on Jenkins continue to take hours. > > On a 40CPU linux machine with the below arguments where we use half the > CPUs in the machine (and ulimit -u 40960), all tests run in just under an > hour: > > $ x="0.50C" ; nohup mvn -T2 -Dsurefire.firstPartForkCount=$x > -Dsurefire.secondPartForkCount=$x test -PrunAllTests > > Upping the forkcount on this machine beyond 0.50C tended to bring a rush of > tests exiting... (To be investigated). On this machine, tests currently > pass about 80% of the time. To be improved. > > On an anemic 4CPU VM, I can run the below and it will pass 60% of the time. > It takes ~5hours: > > $ x="1.0C" ; nohup mvn -T2 -Dsurefire.firstPartForkCount=$x > -Dsurefire.secondPartForkCount=$x test -PrunAllTests > > On a mac w/ 12CPUs, I can run same command as above. It passes with about > same frequency and takes just over 1 1/2 hours. > > On my laptop it is less reliable passing about 1/3rd of the time in about 2 > 1/2 hours. > > If I use less resources, a lesser forkcount, the tests complete more often > (but take correspondingly longer). > > Going forward, we will continue to watch branch-2/branch-2.3. Regards > speedup, there is a bunch to do. A large win is to be had improving the > HDFS mini cluster adding configuration (lots of resources such as pool > thread counts are hard coded and numbers that are large for small test run) > and working on speeding startup times. > > oao, > S > > 1. > > https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.3/lastSuccessfulBuild/artifact/dashboard.html > 2. Unfortunately, the nightly list shows reds though all tests passed > because of report assemblage issues being addressed by infra: > https://issues.apache.org/jira/browse/INFRA-20025 > 3. > > https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk