[jira] [Created] (HBASE-27555) Process all patch-unit-root.txt in report-flakies.py
Peter Somogyi created HBASE-27555: - Summary: Process all patch-unit-root.txt in report-flakies.py Key: HBASE-27555 URL: https://issues.apache.org/jira/browse/HBASE-27555 Project: HBase Issue Type: Bug Components: test Reporter: Peter Somogyi The report-flakies.py which parses the output of the unit tests does not review all the patch-unit-root.txt files. The for loop exists on the first found unit test output file and ignores the rest. Earlier the build only had a single unit test run but the current setup runs multiple builds based on the JDK or Hadoop version. All of these outputs should be parsed by the script. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27554) Test failures on branch-2.4 with corrupted exclude list
Peter Somogyi created HBASE-27554: - Summary: Test failures on branch-2.4 with corrupted exclude list Key: HBASE-27554 URL: https://issues.apache.org/jira/browse/HBASE-27554 Project: HBase Issue Type: Bug Components: jenkins Affects Versions: 2.4.16 Reporter: Peter Somogyi Assignee: Peter Somogyi Nightly builds and PRs on branch-2.4 are failing with an invalid exclude list. Executed unit test command: {code:java} /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4944/yetus-m2/hbase-branch-2.4-patch-0 --threads=4 -Djava.io.tmpdir=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4944/yetus-jdk8-hadoop2-check/src/target -DHBasePatchProcess -PrunAllTests -Dtest.exclude.pattern=**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/client.TestMetaRegionLocationCache.java,**/master.balancer.TestStochasticLoadBalancerRegionReplicaWithRacks.java,**/replication.TestZKReplicationQueueStorageWARNING: All illegal access operations will be denied in a future release.java,**/replication.regionserver.TestBasicWALEntryStreamFSHLog.java -Dsurefire.firstPartForkCount=0.5C -Dsurefire.secondPartForkCount=0.5C clean test -fae {code} The latest exclude list contains "WARNING: All illegal access operations will be denied in a future release" and maven treats this as a new parameter. As a result unit tests are failing on CI that rely on the exclude list. [https://ci-hbase.apache.org/job/HBase-Find-Flaky-Tests/job/branch-2.4/lastSuccessfulBuild/artifact/output/excludes/*view*/] {noformat} **/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/client.TestMetaRegionLocationCache.java,**/master.balancer.TestStochasticLoadBalancerRegionReplicaWithRacks.java,**/replication.TestZKReplicationQueueStorageWARNING: All illegal access operations will be denied in a future release.java,**/replication.regionserver.TestBasicWALEntryStreamFSHLog.java {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27553) SlowLog does not include params for Mutations
Bryan Beaudreault created HBASE-27553: - Summary: SlowLog does not include params for Mutations Key: HBASE-27553 URL: https://issues.apache.org/jira/browse/HBASE-27553 Project: HBase Issue Type: Bug Reporter: Bryan Beaudreault Assignee: Ray Mattingly SlowLog params are extracted via [ProtobufUtil.getSlowLogParams|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java#L2154]. This method has various if/else branches for each request type, but mutation (the line linked above) is incorrect. Currently it handles MutationProto, but it should be MutateRequest. A MutationProto is never passed into this method, only MutateRequests so any MutateRequests being passed in now will fall through to the default case which contains nothing useful about the request. As part of fixing this, we should also ensure that we extract the region name from the MutateRequest to add into the SlowLogParams object like all the other requests. While we are here, the CoprocessorServiceRequest (handled further down) has a getRegion() method, but that is not passed into the SlowLogParams either. We should add that too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27552) HMaster can not in finish Initialization when hbase:namespace is in ABNORMALLY_CLOSED state and the proc corrupt
chaijunjie created HBASE-27552: -- Summary: HMaster can not in finish Initialization when hbase:namespace is in ABNORMALLY_CLOSED state and the proc corrupt Key: HBASE-27552 URL: https://issues.apache.org/jira/browse/HBASE-27552 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 2.4.14 Environment: 2023-01-05 19:56:41,385 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=102, ppid=97, state=SUCCESS; OpenRegionProcedure 1903713b7f970a75db1e7a0e72da21d7, server=node-master2mesq,21302,1672817611868 | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,385 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=103, ppid=96, state=SUCCESS; OpenRegionProcedure 6695b9c5ad80249bc43830ddc5259487, server=node-master2mesq,21302,1672817611868 | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,402 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=106, ppid=82, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; TransitRegionStateProcedure table=ImportTable1, region=050bcf6e15ddd079d750992bbfb53163, ASSIGN | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,403 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=107, ppid=82, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; TransitRegionStateProcedure table=hbase:hindex, region=6789443c0a98d2b34f891ae60878aac3, ASSIGN | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,404 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=108, ppid=82, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; TransitRegionStateProcedure table=hbase:acl, region=96a2ec5ea797e6847188c965f8c78ce1, ASSIGN | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,404 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=109, ppid=82, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; TransitRegionStateProcedure table=ImportTable1, region=24e0cb0a958d242976a790ff435d24b5, ASSIGN | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,405 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=110, ppid=82, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN; TransitRegionStateProcedure table=ImportTable1, region=a2e7b85420a3cf98fc731ad93f7129a2, ASSIGN | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,405 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=111, ppid=82, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN; TransitRegionStateProcedure table=hbase:namespace, region=9be1542260fa8af4a712ddda322b7b6f, ASSIGN | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,406 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=112, ppid=82, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; TransitRegionStateProcedure table=hbase:rsgroup, region=eaf1531c6cc0738027def0b4d4615b5f, ASSIGN | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,406 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=125, ppid=95, state=SUCCESS; OpenRegionProcedure 85301e5c14a8c3e5ba31822d7db0a6fc, server=node-master3mpye,21302,1672817640502 | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,407 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=126, ppid=96, state=SUCCESS; OpenRegionProcedure 6695b9c5ad80249bc43830ddc5259487, server=node-master3mpye,21302,1672817640502 | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) 2023-01-05 19:56:41,408 | ERROR | master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=127, ppid=94, state=SUCCESS; OpenRegionProcedure 448b88d503d4e31c47b80ac10d8ef6a4, server=node-master3mpye,21302,1672817640502 | org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343) Reporter: chaijunjie -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27551) Add config options to delay assignment to retain last region location
Wellington Chevreuil created HBASE-27551: Summary: Add config options to delay assignment to retain last region location Key: HBASE-27551 URL: https://issues.apache.org/jira/browse/HBASE-27551 Project: HBase Issue Type: Improvement Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil HBASE-27313 introduced the ability to persist the list of files cached in a given RS, but temporary RSes loss or restarts would cause regions to be eagerly reassigned on other RSes, making the persisted cache useless. For some use cases, such as when using ObjectStores based persistence, performance degradation caused by cache misses have a worse impact than temporary region unavailability. This proposes and additional config property (disabled by default) to forcibly wait the TRSP for a configurable time while checking for the previous RS holding region to get back online, before proceeding with the region assignment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldW
[ https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi reopened HBASE-23340: --- Reopening to backport to branch-2.4. This fix can also be useful on that branch. > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB) > - > > Key: HBASE-23340 > URL: https://issues.apache.org/jira/browse/HBASE-23340 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 3.0.0-alpha-1, 2.2.3 >Reporter: jackylau >Assignee: Bo Cui >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0 > > Attachments: Snipaste_2019-11-21_10-39-25.png, > Snipaste_2019-11-21_14-10-36.png > > > hmaster /hbase/replication/rs session expired (hbase replication default > value is true, we don't use ) causes logcleaner can not clean oldWALs, which > resulits in oldWALs too large (more than 2TB). > !Snipaste_2019-11-21_10-39-25.png! > > !Snipaste_2019-11-21_14-10-36.png! > > we can solve it by following : > 1) increase the session timeout(but i think it is not a good idea. because we > do not know how long to set is suitable) > 2) close the hbase replication. It is not a good idea too, when our user uses > this feature > 3) we need add retry times, for example when it has already happened three > times, we set the ReplicationLogCleaner and SnapShotCleaner stop > that is all my ideas, i do not konw it is suitable, If it is suitable, could > i commit a PR? > Does anynode have a good idea. -- This message was sent by Atlassian Jira (v8.20.10#820010)