[jira] [Created] (HBASE-27555) Process all patch-unit-root.txt in report-flakies.py

2023-01-05 Thread Peter Somogyi (Jira)
Peter Somogyi created HBASE-27555:
-

 Summary: Process all patch-unit-root.txt in report-flakies.py 
 Key: HBASE-27555
 URL: https://issues.apache.org/jira/browse/HBASE-27555
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Peter Somogyi


The report-flakies.py which parses the output of the unit tests does not review 
all the patch-unit-root.txt files. The for loop exists on the first found unit 
test output file and ignores the rest.

Earlier the build only had a single unit test run but the current setup runs 
multiple builds based on the JDK or Hadoop version. All of these outputs should 
be parsed by the script.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27554) Test failures on branch-2.4 with corrupted exclude list

2023-01-05 Thread Peter Somogyi (Jira)
Peter Somogyi created HBASE-27554:
-

 Summary: Test failures on branch-2.4 with corrupted exclude list
 Key: HBASE-27554
 URL: https://issues.apache.org/jira/browse/HBASE-27554
 Project: HBase
  Issue Type: Bug
  Components: jenkins
Affects Versions: 2.4.16
Reporter: Peter Somogyi
Assignee: Peter Somogyi


Nightly builds and PRs on branch-2.4 are failing with an invalid exclude list.

Executed unit test command:
{code:java}
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4944/yetus-m2/hbase-branch-2.4-patch-0
 --threads=4 
-Djava.io.tmpdir=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4944/yetus-jdk8-hadoop2-check/src/target
 -DHBasePatchProcess -PrunAllTests 
-Dtest.exclude.pattern=**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/client.TestMetaRegionLocationCache.java,**/master.balancer.TestStochasticLoadBalancerRegionReplicaWithRacks.java,**/replication.TestZKReplicationQueueStorageWARNING:
 All illegal access operations will be denied in a future 
release.java,**/replication.regionserver.TestBasicWALEntryStreamFSHLog.java 
-Dsurefire.firstPartForkCount=0.5C -Dsurefire.secondPartForkCount=0.5C clean 
test -fae {code}
The latest exclude list contains "WARNING: All illegal access operations will 
be denied in a future release" and maven treats this as a new parameter. As a 
result unit tests are failing on CI that rely on the exclude list.

[https://ci-hbase.apache.org/job/HBase-Find-Flaky-Tests/job/branch-2.4/lastSuccessfulBuild/artifact/output/excludes/*view*/]
{noformat}
**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/client.TestMetaRegionLocationCache.java,**/master.balancer.TestStochasticLoadBalancerRegionReplicaWithRacks.java,**/replication.TestZKReplicationQueueStorageWARNING:
 All illegal access operations will be denied in a future 
release.java,**/replication.regionserver.TestBasicWALEntryStreamFSHLog.java 
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27553) SlowLog does not include params for Mutations

2023-01-05 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-27553:
-

 Summary: SlowLog does not include params for Mutations
 Key: HBASE-27553
 URL: https://issues.apache.org/jira/browse/HBASE-27553
 Project: HBase
  Issue Type: Bug
Reporter: Bryan Beaudreault
Assignee: Ray Mattingly


SlowLog params are extracted via 
[ProtobufUtil.getSlowLogParams|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java#L2154].
 This method has various if/else branches for each request type, but mutation 
(the line linked above) is incorrect. Currently it handles MutationProto, but 
it should be MutateRequest. A MutationProto is never passed into this method, 
only MutateRequests so any MutateRequests being passed in now will fall through 
to the default case which contains nothing useful about the request.

As part of fixing this, we should also ensure that we extract the region name 
from the MutateRequest to add into the SlowLogParams object like all the other 
requests.

While we are here, the CoprocessorServiceRequest (handled further down) has a 
getRegion() method, but that is not passed into the SlowLogParams either. We 
should add that too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27552) HMaster can not in finish Initialization when hbase:namespace is in ABNORMALLY_CLOSED state and the proc corrupt

2023-01-05 Thread chaijunjie (Jira)
chaijunjie created HBASE-27552:
--

 Summary: HMaster can not in finish Initialization when 
hbase:namespace is in ABNORMALLY_CLOSED state and the proc corrupt
 Key: HBASE-27552
 URL: https://issues.apache.org/jira/browse/HBASE-27552
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.4.14
 Environment: 2023-01-05 19:56:41,385 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=102, ppid=97, 
state=SUCCESS; OpenRegionProcedure 1903713b7f970a75db1e7a0e72da21d7, 
server=node-master2mesq,21302,1672817611868 | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,385 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=103, ppid=96, 
state=SUCCESS; OpenRegionProcedure 6695b9c5ad80249bc43830ddc5259487, 
server=node-master2mesq,21302,1672817611868 | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,402 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=106, ppid=82, 
state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; 
TransitRegionStateProcedure table=ImportTable1, 
region=050bcf6e15ddd079d750992bbfb53163, ASSIGN | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,403 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=107, ppid=82, 
state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; 
TransitRegionStateProcedure table=hbase:hindex, 
region=6789443c0a98d2b34f891ae60878aac3, ASSIGN | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,404 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=108, ppid=82, 
state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; 
TransitRegionStateProcedure table=hbase:acl, 
region=96a2ec5ea797e6847188c965f8c78ce1, ASSIGN | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,404 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=109, ppid=82, 
state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; 
TransitRegionStateProcedure table=ImportTable1, 
region=24e0cb0a958d242976a790ff435d24b5, ASSIGN | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,405 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=110, ppid=82, 
state=RUNNABLE:REGION_STATE_TRANSITION_OPEN; TransitRegionStateProcedure 
table=ImportTable1, region=a2e7b85420a3cf98fc731ad93f7129a2, ASSIGN | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,405 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=111, ppid=82, 
state=RUNNABLE:REGION_STATE_TRANSITION_OPEN; TransitRegionStateProcedure 
table=hbase:namespace, region=9be1542260fa8af4a712ddda322b7b6f, ASSIGN | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,406 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=112, ppid=82, 
state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; 
TransitRegionStateProcedure table=hbase:rsgroup, 
region=eaf1531c6cc0738027def0b4d4615b5f, ASSIGN | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,406 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=125, ppid=95, 
state=SUCCESS; OpenRegionProcedure 85301e5c14a8c3e5ba31822d7db0a6fc, 
server=node-master3mpye,21302,1672817640502 | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,407 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=126, ppid=96, 
state=SUCCESS; OpenRegionProcedure 6695b9c5ad80249bc43830ddc5259487, 
server=node-master3mpye,21302,1672817640502 | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
2023-01-05 19:56:41,408 | ERROR | 
master/node-master3MPYe:21300:becomeActiveMaster | Corrupt pid=127, ppid=94, 
state=SUCCESS; OpenRegionProcedure 448b88d503d4e31c47b80ac10d8ef6a4, 
server=node-master3mpye,21302,1672817640502 | 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.handleCorrupted(ProcedureExecutor.java:343)
Reporter: chaijunjie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27551) Add config options to delay assignment to retain last region location

2023-01-05 Thread Wellington Chevreuil (Jira)
Wellington Chevreuil created HBASE-27551:


 Summary: Add config options to delay assignment to retain last 
region location
 Key: HBASE-27551
 URL: https://issues.apache.org/jira/browse/HBASE-27551
 Project: HBase
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


HBASE-27313 introduced the ability to persist the list of files cached in a 
given RS, but temporary RSes loss or restarts would cause regions to be eagerly 
reassigned on other RSes, making the persisted cache useless. For some use 
cases, such as when using ObjectStores based persistence, performance 
degradation caused by cache misses have a worse impact than temporary region 
unavailability.  

This proposes and additional config property (disabled by default) to forcibly 
wait the TRSP for a configurable time while checking for the previous RS 
holding region to get back online, before proceeding with the region assignment.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldW

2023-01-05 Thread Peter Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi reopened HBASE-23340:
---

Reopening to backport to branch-2.4. This fix can also be useful on that 
branch. 

> hmaster  /hbase/replication/rs  session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB)
> -
>
> Key: HBASE-23340
> URL: https://issues.apache.org/jira/browse/HBASE-23340
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: jackylau
>Assignee: Bo Cui
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: Snipaste_2019-11-21_10-39-25.png, 
> Snipaste_2019-11-21_14-10-36.png
>
>
> hmaster /hbase/replication/rs session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB).
> !Snipaste_2019-11-21_10-39-25.png!
>  
> !Snipaste_2019-11-21_14-10-36.png!
>  
> we can solve it by following :
> 1) increase the session timeout(but i think it is not a good idea. because we 
> do not know how long to set is suitable)
> 2) close the hbase replication. It is not a good idea too, when our user uses 
> this feature
> 3) we need add retry times, for example when it has already happened three 
> times, we set the ReplicationLogCleaner and SnapShotCleaner stop
> that is all my ideas, i do not konw it is suitable, If it is suitable, could 
> i commit a PR?
> Does anynode have a good idea.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)