[jira] [Commented] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-09-24 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626786#comment-16626786
 ] 

Xu Cang commented on HBASE-20690:
-

Thanks for your great explanation. [~andrewcheng]

non-binding +1 to your patch!

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20690.master.001.patch, 
> HBASE-20690.master.002.patch
>
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-24 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626773#comment-16626773
 ] 

stack commented on HBASE-21213:
---

.006 addresses [~allan163] review changes attribute 'force' name to 'override' 
instead.

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-24 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21213:
--
Attachment: HBASE-21213.branch-2.1.006.patch

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED

2018-09-24 Thread Guangxu Cheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangxu Cheng updated HBASE-20636:
--
Release Note: 
Add two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED
1. ROWPREFIX_FIXED_LENGTH: specify the length of the prefix
2. ROWPREFIX_DELIMITED: specify the delimiter of the prefix
Need to specify parameters for these two types of bloomfilter, otherwise the 
table will fail to create
Example:
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_FIXED_LENGTH', 
CONFIGURATION => {'RowPrefixBloomFilter.prefix_length' => '10'}}
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_DELIMITED', CONFIGURATION 
=> {'RowPrefixDelimitedBloomFilter.delimiter' => '#'}}
 Summary: Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and 
ROWPREFIX_DELIMITED  (was: Introduce two bloom filter type : ROWPREFIX and 
ROWPREFIX_DELIMITED)

Add Release Note.Thanks [~anoop.hbase]. Thank [~apurtell] for commit. Thanks 
for all the reviews.

> Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and 
> ROWPREFIX_DELIMITED
> 
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work

2018-09-24 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626698#comment-16626698
 ] 

Reid Chan edited comment on HBASE-20993 at 9/25/18 3:41 AM:


It is a pretty useful test, SGTM.

May i ask one question about implementation: how does it run an older version 
client against a patch-applied(may change wire protocol) pseudo distributed 
cluster?
It seems `hbase_version` can be used for controlling the client version, but it 
is from `bin/hbase version`, will it be the same version as the cluster?


was (Author: reidchan):
It is a pretty useful test, SGTM.

May i ask one question about implementation: how does it run an older version 
client against a patch-applied(may change wire protocol) pseudo distributed 
cluster?
It seems `hbase_version` can be used for controlling the client version, but it 
is from `hbase version`, will it be the same version as the cluster?

> [Auth] IPC client fallback to simple auth allowed doesn't work
> --
>
> Key: HBASE-20993
> URL: https://issues.apache.org/jira/browse/HBASE-20993
> Project: HBase
>  Issue Type: Bug
>  Components: Client, IPC/RPC, security
>Affects Versions: 1.2.6, 1.3.2, 1.2.7, 1.4.7
>Reporter: Reid Chan
>Assignee: Jack Bearden
>Priority: Critical
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-20993.001.patch, 
> HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, 
> HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, 
> HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, 
> HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, 
> HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, 
> HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, 
> HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt
>
>
> It is easily reproducible.
> client's hbase-site.xml: hadoop.security.authentication:kerberos, 
> hbase.security.authentication:kerberos, 
> hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal 
> are right set
> A simple auth hbase cluster, a kerberized hbase client application. 
> application trying to r/w/c/d table will have following exception:
> {code}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581)
>   at 
> 

[jira] [Updated] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21221:
---
Attachment: 21221.v12.txt

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 21221.v10.txt, 21221.v11.txt, 21221.v12.txt, 
> 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-09-24 Thread Guangxu Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626704#comment-16626704
 ] 

Guangxu Cheng commented on HBASE-20690:
---

Attach 002 patch to move operations in {{preCreateTable}} to 
{{preCreateTableAction}}.

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
> Attachments: HBASE-20690.master.001.patch, 
> HBASE-20690.master.002.patch
>
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-09-24 Thread Guangxu Cheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangxu Cheng updated HBASE-20690:
--
Assignee: Guangxu Cheng
  Status: Patch Available  (was: Open)

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20690.master.001.patch, 
> HBASE-20690.master.002.patch
>
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-09-24 Thread Guangxu Cheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangxu Cheng updated HBASE-20690:
--
Attachment: HBASE-20690.master.002.patch

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
> Attachments: HBASE-20690.master.001.patch, 
> HBASE-20690.master.002.patch
>
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException

2018-09-24 Thread Guangxu Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626699#comment-16626699
 ] 

Guangxu Cheng commented on HBASE-20690:
---

Thanks for your review.[~xucang]

 # Let's look at the difference between {{groupAdminServer.moveTables}} and 
{{groupInfoManager.moveTables}} first.
  #* {{groupAdminServer.moveTables}} will update the group information and move 
the regions of the table to the specified group.
  #* {{groupInfoManager.moveTables}} just updates the group information.
The {{preCreateTable}} is executed before the table is built successfully. At 
this time, the related information of the region has not been generated and 
assigned yet, so there is no need to move the region. It is because the table 
has not been created successfully, so there will be TableStateNotFoundException.
 After we add the table to the specified group in advance, during the process 
of creating the table, {{CREATE_TABLE_ASSIGN_REGIONS}} will assign the regions 
to the specified regionservers according to the group information.
 # In the process of creating a table, you can roll back only if an exception 
occurs during  {{CREATE_TABLE_PRE_OPERATION}}.
 {{CREATE_TABLE_PRE_OPERATION}} is to determine whether a table exists.
 If the table already exists, it is possible to change the grouping to which 
the table belongs, and this place really needs to be rolled back.
 We only need to move operations in {{preCreateTable}} to 
{{preCreateTableAction}}.

> Moving table to target rsgroup needs to handle TableStateNotFoundException
> --
>
> Key: HBASE-20690
> URL: https://issues.apache.org/jira/browse/HBASE-20690
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
> Attachments: HBASE-20690.master.001.patch
>
>
> This is related code:
> {code}
>   if (targetGroup != null) {
> for (TableName table: tables) {
>   if (master.getAssignmentManager().isTableDisabled(table)) {
> LOG.debug("Skipping move regions because the table" + table + " 
> is disabled.");
> continue;
>   }
> {code}
> In a stack trace [~rmani] showed me:
> {code}
> 2018-06-06 07:10:44,893 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] 
> master.TableStateManager: Unable to get table demo:tbl1 state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> demo:tbl1
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447)
> at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
> at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
> at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331)
> at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768)
> at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
> at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
> The logic should take potential TableStateNotFoundException into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work

2018-09-24 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626698#comment-16626698
 ] 

Reid Chan commented on HBASE-20993:
---

It is a pretty useful test, SGTM.

May i ask one question about implementation: how does it run an older version 
client against a patch-applied(may change wire protocol) pseudo distributed 
cluster?
It seems `hbase_version` can be used for controlling the client version, but it 
is from `hbase version`, will it be the same version as the cluster?

> [Auth] IPC client fallback to simple auth allowed doesn't work
> --
>
> Key: HBASE-20993
> URL: https://issues.apache.org/jira/browse/HBASE-20993
> Project: HBase
>  Issue Type: Bug
>  Components: Client, IPC/RPC, security
>Affects Versions: 1.2.6, 1.3.2, 1.2.7, 1.4.7
>Reporter: Reid Chan
>Assignee: Jack Bearden
>Priority: Critical
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-20993.001.patch, 
> HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, 
> HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, 
> HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, 
> HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, 
> HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, 
> HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, 
> HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt
>
>
> It is easily reproducible.
> client's hbase-site.xml: hadoop.security.authentication:kerberos, 
> hbase.security.authentication:kerberos, 
> hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal 
> are right set
> A simple auth hbase cluster, a kerberized hbase client application. 
> application trying to r/w/c/d table will have following exception:
> {code}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738)
>   at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289)
>   at 
> 

[jira] [Commented] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626686#comment-16626686
 ] 

Hadoop QA commented on HBASE-21221:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
1s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
21s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} hbase-server: The patch generated 0 new + 20 
unchanged - 1 fixed = 20 total (was 21) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 50s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}125m 
28s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}167m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941123/21221.v11.txt |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 2dd1e3b82aa3 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7ab77518a2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14487/testReport/ |
| 

[jira] [Updated] (HBASE-21208) Bytes#toShort doesn't work without unsafe

2018-09-24 Thread Chia-Ping Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai updated HBASE-21208:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Bytes#toShort doesn't work without unsafe
> -
>
> Key: HBASE-21208
> URL: https://issues.apache.org/jira/browse/HBASE-21208
> Project: HBase
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Critical
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21208.v0.patch, HBASE-21208.v1.patch, 
> HBASE-21208.v2.patch
>
>
> seems we put the brackets in the wrong place.
> {code}
>   short n = 0;
>   n = (short) ((n ^ bytes[offset]) & 0xFF);
>   n = (short) (n << 8);
>   n = (short) ((n ^ bytes[offset+1]) & 0xFF);   // this one
>   return n;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21219) Hbase incremental backup fails with null pointer exception

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626653#comment-16626653
 ] 

Hadoop QA commented on HBASE-21219:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
21s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
16s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
12m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
36s{color} | {color:green} hbase-backup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21219 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941132/HBASE-21219-v1.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 77fb748a30ce 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7ab77518a2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14488/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14488/testReport/ |
| 

[jira] [Commented] (HBASE-21219) Hbase incremental backup fails with null pointer exception

2018-09-24 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626634#comment-16626634
 ] 

Ted Yu commented on HBASE-21219:


{code}
307   LOG.warn("Known hosts (from newestTimestamps):");
308   for (String s: newestTimestamps.keySet()) {
309 LOG.warn(s);
{code}
The above seems to be for debug purpose. Use LOG.debug ?

> Hbase incremental backup fails with null pointer exception
> --
>
> Key: HBASE-21219
> URL: https://issues.apache.org/jira/browse/HBASE-21219
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21219-v1.patch
>
>
> hbase backup create incremental hdfs:///bkpHbase_Test/bkpHbase_Test2 -t 
> bkpHbase_Test2 
> 2018-09-21 15:35:31,421 INFO [main] impl.TableBackupClient: Backup 
> backup_1537524313995 started at 1537524331419. 2018-09-21 15:35:31,454 INFO 
> [main] impl.IncrementalBackupManager: Execute roll log procedure for 
> incremental backup ... 2018-09-21 15:35:32,985 ERROR [main] 
> impl.TableBackupClient: Unexpected Exception : java.lang.NullPointerException 
> java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getLogFilesForNewBackup(IncrementalBackupManager.java:309)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getIncrBackupLogFileMap(IncrementalBackupManager.java:103)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:276)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:347)
>  at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:138)
>  at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:171) 
> at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:204) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:179) 
> 2018-09-21 15:35:32,989 ERROR [main] impl.TableBackupClient: 
> BackupId=backup_1537524313995,startts=1537524331419,failedts=1537524332989,failedphase=PREPARE_INCREMENTAL,failedmessage=null
>  2018-09-21 15:35:57,167 ERROR [main] impl.TableBackupClient: Backup 
> backup_1537524313995 failed. 
> Backup session finished. Status: FAILURE 2018-09-21 15:35:57,175 ERROR [main] 
> backup.BackupDriver: Error running 
> command-line tool java.io.IOException: java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:281)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:347)
>  at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:138)
>  at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:171) 
> at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:204) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:179) 
> Caused by: java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getLogFilesForNewBackup(IncrementalBackupManager.java:309)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getIncrBackupLogFileMap(IncrementalBackupManager.java:103)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:276)
>  ... 7 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21219) Hbase incremental backup fails with null pointer exception

2018-09-24 Thread Vladimir Rodionov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-21219:
--
Status: Patch Available  (was: Open)

> Hbase incremental backup fails with null pointer exception
> --
>
> Key: HBASE-21219
> URL: https://issues.apache.org/jira/browse/HBASE-21219
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21219-v1.patch
>
>
> hbase backup create incremental hdfs:///bkpHbase_Test/bkpHbase_Test2 -t 
> bkpHbase_Test2 
> 2018-09-21 15:35:31,421 INFO [main] impl.TableBackupClient: Backup 
> backup_1537524313995 started at 1537524331419. 2018-09-21 15:35:31,454 INFO 
> [main] impl.IncrementalBackupManager: Execute roll log procedure for 
> incremental backup ... 2018-09-21 15:35:32,985 ERROR [main] 
> impl.TableBackupClient: Unexpected Exception : java.lang.NullPointerException 
> java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getLogFilesForNewBackup(IncrementalBackupManager.java:309)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getIncrBackupLogFileMap(IncrementalBackupManager.java:103)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:276)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:347)
>  at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:138)
>  at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:171) 
> at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:204) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:179) 
> 2018-09-21 15:35:32,989 ERROR [main] impl.TableBackupClient: 
> BackupId=backup_1537524313995,startts=1537524331419,failedts=1537524332989,failedphase=PREPARE_INCREMENTAL,failedmessage=null
>  2018-09-21 15:35:57,167 ERROR [main] impl.TableBackupClient: Backup 
> backup_1537524313995 failed. 
> Backup session finished. Status: FAILURE 2018-09-21 15:35:57,175 ERROR [main] 
> backup.BackupDriver: Error running 
> command-line tool java.io.IOException: java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:281)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:347)
>  at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:138)
>  at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:171) 
> at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:204) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:179) 
> Caused by: java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getLogFilesForNewBackup(IncrementalBackupManager.java:309)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getIncrBackupLogFileMap(IncrementalBackupManager.java:103)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:276)
>  ... 7 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21219) Hbase incremental backup fails with null pointer exception

2018-09-24 Thread Vladimir Rodionov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-21219:
--
Attachment: HBASE-21219-v1.patch

> Hbase incremental backup fails with null pointer exception
> --
>
> Key: HBASE-21219
> URL: https://issues.apache.org/jira/browse/HBASE-21219
> Project: HBase
>  Issue Type: Bug
>  Components: backuprestore
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21219-v1.patch
>
>
> hbase backup create incremental hdfs:///bkpHbase_Test/bkpHbase_Test2 -t 
> bkpHbase_Test2 
> 2018-09-21 15:35:31,421 INFO [main] impl.TableBackupClient: Backup 
> backup_1537524313995 started at 1537524331419. 2018-09-21 15:35:31,454 INFO 
> [main] impl.IncrementalBackupManager: Execute roll log procedure for 
> incremental backup ... 2018-09-21 15:35:32,985 ERROR [main] 
> impl.TableBackupClient: Unexpected Exception : java.lang.NullPointerException 
> java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getLogFilesForNewBackup(IncrementalBackupManager.java:309)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getIncrBackupLogFileMap(IncrementalBackupManager.java:103)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:276)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:347)
>  at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:138)
>  at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:171) 
> at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:204) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:179) 
> 2018-09-21 15:35:32,989 ERROR [main] impl.TableBackupClient: 
> BackupId=backup_1537524313995,startts=1537524331419,failedts=1537524332989,failedphase=PREPARE_INCREMENTAL,failedmessage=null
>  2018-09-21 15:35:57,167 ERROR [main] impl.TableBackupClient: Backup 
> backup_1537524313995 failed. 
> Backup session finished. Status: FAILURE 2018-09-21 15:35:57,175 ERROR [main] 
> backup.BackupDriver: Error running 
> command-line tool java.io.IOException: java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:281)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601)
>  at 
> org.apache.hadoop.hbase.backup.impl.BackupCommands$CreateCommand.execute(BackupCommands.java:347)
>  at 
> org.apache.hadoop.hbase.backup.BackupDriver.parseAndRun(BackupDriver.java:138)
>  at org.apache.hadoop.hbase.backup.BackupDriver.doWork(BackupDriver.java:171) 
> at org.apache.hadoop.hbase.backup.BackupDriver.run(BackupDriver.java:204) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
> org.apache.hadoop.hbase.backup.BackupDriver.main(BackupDriver.java:179) 
> Caused by: java.lang.NullPointerException at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getLogFilesForNewBackup(IncrementalBackupManager.java:309)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalBackupManager.getIncrBackupLogFileMap(IncrementalBackupManager.java:103)
>  at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:276)
>  ... 7 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626610#comment-16626610
 ] 

Mingliang Liu commented on HBASE-21221:
---

# {code:java}
  if (!exceptionDuringMutateRows.get()) {
fail("This cp should fail because the target lock is blocked by previous 
put");
  }
{code}
Can be {{assertTrue()}}.
# {code}
WaitingForMultiMutationsObserver observer = find(tableName,...
{code}
The change can be reverted.
# {code}
LOG.debug("encountered " + ex);
{code}
Can be
{code}
LOG.error("encountered unexpected exception", ex);
{code}

Otherwise +1 (non-binding)

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 21221.v10.txt, 21221.v11.txt, 21221.v7.txt, 
> 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21221:
---
Attachment: 21221.v11.txt

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 21221.v10.txt, 21221.v11.txt, 21221.v7.txt, 
> 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626570#comment-16626570
 ] 

Hadoop QA commented on HBASE-21221:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
1s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for 
instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  3s{color} 
| {color:red} HBASE-21221 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.7.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941120/21221.v10.txt |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14486/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 21221.v10.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626563#comment-16626563
 ] 

Ted Yu commented on HBASE-21221:


bq. the ex in RecordingEndpoint should be safely published

The custom endpoint is removed in patch v10. So the above is not needed.

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 21221.v10.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21221:
---
Attachment: 21221.v10.txt

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 21221.v10.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-24 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626498#comment-16626498
 ] 

Mingliang Liu commented on HBASE-21221:
---

[~yuzhih...@gmail.com], the root cause analysis makes sense to me. For the 
patch, I think the {{ex}} in {{RecordingEndpoint}} should be safely published 
across multiple threads, e.g. using volatile keyword.

I'm thinking if the test would be simpler by checking the controller exception 
in the {{callable}} we created and passed to the {{table.coprocessorService}}.
{code:java}
   
CoprocessorRpcUtils.BlockingRpcCallback
 rpcCallback = new CoprocessorRpcUtils.BlockingRpcCallback<>();
   exe.mutateRows(controller, request, rpcCallback);
+  if (controller.failedOnException()) {
+exceptionOnMutateRows.set(true); // exceptionOnMutateRows is 
AtomicBoolean declared in the test method; will assert this value.
+  }
   return rpcCallback.get();
 });
{code}

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20940) HStore.cansplit should not allow split to happen if it has references

2018-09-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626333#comment-16626333
 ] 

Hudson commented on HBASE-20940:


FAILURE: Integrated in Jenkins build PreCommit-PHOENIX-Build #2059 (See 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/2059/])
After HBASE-20940 any local index query will open all HFiles of every (larsh: 
rev df998e6d7840db4669a395fa6460c42c434af633)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/iterate/RegionScannerFactory.java


> HStore.cansplit should not allow split to happen if it has references
> -
>
> Key: HBASE-20940
> URL: https://issues.apache.org/jira/browse/HBASE-20940
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 1.4.7, 2.0.3
>
> Attachments: HBASE-20940-branch-1-addendum.patch, 
> HBASE-20940.branch-1.3.v1.patch, HBASE-20940.branch-1.3.v2.patch, 
> HBASE-20940.branch-1.v1.patch, HBASE-20940.branch-1.v2.patch, 
> HBASE-20940.branch-1.v3.patch, HBASE-20940.branch-1.v5.patch, 
> HBASE-20940.v1.patch, HBASE-20940.v2.patch, HBASE-20940.v3.patch, 
> HBASE-20940.v4.patch, result_HBASE-20940.branch-1.v2.log
>
>
> When split happens and immediately another split happens, it may result into 
> a split of a region who still has references to its parent. More details 
> about scenario can be found here HBASE-20933
> HStore.hasReferences should check from fs.storefile rather than in memory 
> objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626122#comment-16626122
 ] 

Hadoop QA commented on HBASE-21217:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
16s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} hbase-server: The patch generated 0 new + 315 
unchanged - 6 fixed = 315 total (was 321) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 
28s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21217 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941052/HBASE-21217-v2.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 85dae2e34597 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7ab77518a2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14485/testReport/ |
| Max. process+thread count | 5184 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14485/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message 

[jira] [Commented] (HBASE-21208) Bytes#toShort doesn't work without unsafe

2018-09-24 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626115#comment-16626115
 ] 

Chia-Ping Tsai commented on HBASE-21208:


v2 will be merged tomorrow if no objections.

> Bytes#toShort doesn't work without unsafe
> -
>
> Key: HBASE-21208
> URL: https://issues.apache.org/jira/browse/HBASE-21208
> Project: HBase
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Critical
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21208.v0.patch, HBASE-21208.v1.patch, 
> HBASE-21208.v2.patch
>
>
> seems we put the brackets in the wrong place.
> {code}
>   short n = 0;
>   n = (short) ((n ^ bytes[offset]) & 0xFF);
>   n = (short) (n << 8);
>   n = (short) ((n ^ bytes[offset+1]) & 0xFF);   // this one
>   return n;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-24 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626047#comment-16626047
 ] 

stack commented on HBASE-21217:
---

Ugh. Just saw this. Thanks lads.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-16627) AssignmentManager#isDisabledorDisablingRegionInRIT should check whether table exists

2018-09-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-16627.

Resolution: Later

> AssignmentManager#isDisabledorDisablingRegionInRIT should check whether table 
> exists
> 
>
> Key: HBASE-16627
> URL: https://issues.apache.org/jira/browse/HBASE-16627
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Stephen Yuan Jiang
>Priority: Minor
>
> [~stack] first reported this issue when he played with backup feature.
> The following exception can be observed in backup unit tests:
> {code}
> 2016-09-13 16:21:57,661 ERROR [ProcedureExecutor-3] 
> master.TableStateManager(134): Unable to get table hbase:backup state
> org.apache.hadoop.hbase.TableNotFoundException: hbase:backup
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:174)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:131)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.isDisabledorDisablingRegionInRIT(AssignmentManager.java:1221)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:739)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1567)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1546)
> at 
> org.apache.hadoop.hbase.util.ModifyRegionUtils.assignRegions(ModifyRegionUtils.java:254)
> at 
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.assignRegions(CreateTableProcedure.java:430)
> at 
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:127)
> at 
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:57)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:452)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1066)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:855)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:808)
> {code}
> AssignmentManager#isDisabledorDisablingRegionInRIT should take table 
> existence into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18580) Allow gdb to attach to selected process in docker

2018-09-24 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18580:
---
Labels: debugger  (was: )

> Allow gdb to attach to selected process in docker
> -
>
> Key: HBASE-18580
> URL: https://issues.apache.org/jira/browse/HBASE-18580
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Priority: Major
>  Labels: debugger
>
> In the current docker image, if you want to attach gdb to a process, you 
> would see:
> bq. ptrace: Operation not permitted
> We should provide better support for gdb in docker.
> This article is a start:
> https://thirld.com/blog/2016/08/15/war-stories-debugging-julia-gdb-docker/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-24 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625924#comment-16625924
 ] 

Mike Drob commented on HBASE-18451:
---

{noformat}
+  LOG.info(getName() + " requesting flush of " +
+  r.getRegionInfo().getRegionNameAsString() + " because " +
+  whyFlush.toString() +
+  " after random delay " + randomDelay + "ms");
{noformat}
nit: can we switch this to parameterized logging?

{noformat}
  @Override
  public boolean requestDelayedFlush(HRegion r, long delay, boolean 
forceFlushAllStores) {
r.incrementFlushesQueuedCount();
synchronized (regionsInQueue) {
  if (!regionsInQueue.containsKey(r)) {
// This entry has some delay
FlushRegionEntry fqe =
new FlushRegionEntry(r, forceFlushAllStores, 
FlushLifeCycleTracker.DUMMY);
fqe.requeue(delay);
this.regionsInQueue.put(r, fqe);
this.flushQueue.add(fqe);
return true;
  }
  return false;
}
  }
{noformat}
Is that call to {{incrementFlushesQueuedCount}} correct? We attempt to queue 
one, but don't always add one to the queue, so the metric is going to be 
over-inflated. Same thing happens in {{requestFlush}}.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old 

[jira] [Updated] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-24 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21217:
--
Attachment: HBASE-21217-v2.patch

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-24 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625867#comment-16625867
 ] 

Allan Yang commented on HBASE-18451:


+1 for the patch. The compaction queue has the same issue. [~xucang] maybe you 
can take a look. We may queue the same Store in the compaction queue to compact 
over and over again, making the compaction queue very big. But it may not cause 
big trouble like this one here. So it is not a urgent thing to do.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> 

[jira] [Created] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-24 Thread stack (JIRA)
stack created HBASE-21223:
-

 Summary: [amv2] Remove abort_procedure from shell
 Key: HBASE-21223
 URL: https://issues.apache.org/jira/browse/HBASE-21223
 Project: HBase
  Issue Type: Bug
  Components: amv2, hbck2
Reporter: stack
Assignee: stack


Remove this command. It will cause more damage than it could ever solve. It 
should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21222) [amv2] Closing region on a non-existent server creates STUCK regions

2018-09-24 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625816#comment-16625816
 ] 

stack commented on HBASE-21222:
---

Yes. A workaround is clearing out the old location. That seems to work. I'll 
write it up.

> [amv2] Closing region on a non-existent server creates STUCK regions
> 
>
> Key: HBASE-21222
> URL: https://issues.apache.org/jira/browse/HBASE-21222
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Ran into this one where a Region had been on a server but after a bunch of 
> crashing and meddling in Master Proc WALs, any attempt at unassign has the 
> procedure fail (see below) and then report the region as STUCK.
> I broke the lock w/ new hbck2 tooling and then tried to offline again but 
> same thing happened. Bug. Fix.
> {code}
> 2018-09-22 18:36:41,900 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch 
> pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558
> 2018-09-22 18:36:41,899 INFO 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: 
> pid=138646, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=0780467efe4c5901887fb12bfa406fa7, 
> server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on 
> 0780467efe4c5901887fb12bfa406fa7
> 2018-09-22 18:36:41,900 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote 
> call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, 
> ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
> vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558
> at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983)
> 2018-09-22 18:36:41,903 WARN 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring 
> vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21222) [amv2] Closing region on a non-existent server creates STUCK regions

2018-09-24 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625812#comment-16625812
 ] 

Duo Zhang commented on HBASE-21222:
---

Got it. So we need a tool in HBCK2 to handle this case.

> [amv2] Closing region on a non-existent server creates STUCK regions
> 
>
> Key: HBASE-21222
> URL: https://issues.apache.org/jira/browse/HBASE-21222
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Ran into this one where a Region had been on a server but after a bunch of 
> crashing and meddling in Master Proc WALs, any attempt at unassign has the 
> procedure fail (see below) and then report the region as STUCK.
> I broke the lock w/ new hbck2 tooling and then tried to offline again but 
> same thing happened. Bug. Fix.
> {code}
> 2018-09-22 18:36:41,900 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch 
> pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558
> 2018-09-22 18:36:41,899 INFO 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: 
> pid=138646, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=0780467efe4c5901887fb12bfa406fa7, 
> server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on 
> 0780467efe4c5901887fb12bfa406fa7
> 2018-09-22 18:36:41,900 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote 
> call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, 
> ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
> vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558
> at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983)
> 2018-09-22 18:36:41,903 WARN 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring 
> vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21222) [amv2] Closing region on a non-existent server creates STUCK regions

2018-09-24 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625801#comment-16625801
 ] 

stack commented on HBASE-21222:
---

In this case it is because WALs were deleted.

Been thinking about this. It should not happen during usual operation but we 
should have some defense in place just in case it does manage to bubble-up.

> [amv2] Closing region on a non-existent server creates STUCK regions
> 
>
> Key: HBASE-21222
> URL: https://issues.apache.org/jira/browse/HBASE-21222
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Ran into this one where a Region had been on a server but after a bunch of 
> crashing and meddling in Master Proc WALs, any attempt at unassign has the 
> procedure fail (see below) and then report the region as STUCK.
> I broke the lock w/ new hbck2 tooling and then tried to offline again but 
> same thing happened. Bug. Fix.
> {code}
> 2018-09-22 18:36:41,900 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch 
> pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558
> 2018-09-22 18:36:41,899 INFO 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: 
> pid=138646, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=0780467efe4c5901887fb12bfa406fa7, 
> server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on 
> 0780467efe4c5901887fb12bfa406fa7
> 2018-09-22 18:36:41,900 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote 
> call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, 
> ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
> vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558
> at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983)
> 2018-09-22 18:36:41,903 WARN 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring 
> vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21193) Retrying Callable doesn't take max retries from current context; uses defaults instead

2018-09-24 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21193.
---
Resolution: Cannot Reproduce

> Retrying Callable doesn't take max retries from current context; uses 
> defaults instead
> --
>
> Key: HBASE-21193
> URL: https://issues.apache.org/jira/browse/HBASE-21193
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
>
> This makes it hard to change retry count on a read of meta for instance.
> I noticed this when trying to change the defaults for a meta read. I made a 
> customer Connection inside in the master with a new Configuration that had 
> rpc retries and timings upped radically. My reads nonetheless were finishing 
> at the usual retry point (31 tries after 60 seconds or so) because it looked 
> like the Retrying Callable that does the read was taking  max retries from 
> defaults rather than reading the passed in Configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21193) Retrying Callable doesn't take max retries from current context; uses defaults instead

2018-09-24 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625771#comment-16625771
 ] 

stack commented on HBASE-21193:
---

Thank you [~xucang] for taking a look. I'm glad the maxattempts is working. I 
need to do better explanation of the phenomenon I saw why I thought it 
wasn't working. I've lost my context though now and rather than leave this 
issue open, let me resolve it. I'll assign it to you since you did some nice 
work.  Thank you.

> Retrying Callable doesn't take max retries from current context; uses 
> defaults instead
> --
>
> Key: HBASE-21193
> URL: https://issues.apache.org/jira/browse/HBASE-21193
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Major
>
> This makes it hard to change retry count on a read of meta for instance.
> I noticed this when trying to change the defaults for a meta read. I made a 
> customer Connection inside in the master with a new Configuration that had 
> rpc retries and timings upped radically. My reads nonetheless were finishing 
> at the usual retry point (31 tries after 60 seconds or so) because it looked 
> like the Retrying Callable that does the read was taking  max retries from 
> defaults rather than reading the passed in Configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21193) Retrying Callable doesn't take max retries from current context; uses defaults instead

2018-09-24 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-21193:
-

Assignee: Xu Cang

> Retrying Callable doesn't take max retries from current context; uses 
> defaults instead
> --
>
> Key: HBASE-21193
> URL: https://issues.apache.org/jira/browse/HBASE-21193
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
>
> This makes it hard to change retry count on a read of meta for instance.
> I noticed this when trying to change the defaults for a meta read. I made a 
> customer Connection inside in the master with a new Configuration that had 
> rpc retries and timings upped radically. My reads nonetheless were finishing 
> at the usual retry point (31 tries after 60 seconds or so) because it looked 
> like the Retrying Callable that does the read was taking  max retries from 
> defaults rather than reading the passed in Configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21222) [amv2] Closing region on a non-existent server creates STUCK regions

2018-09-24 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625769#comment-16625769
 ] 

Duo Zhang commented on HBASE-21222:
---

Is this because you delete all the master proc wals? Or it could happen after 
crashing and failover? If the latter I think there are critical bugs?

> [amv2] Closing region on a non-existent server creates STUCK regions
> 
>
> Key: HBASE-21222
> URL: https://issues.apache.org/jira/browse/HBASE-21222
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Ran into this one where a Region had been on a server but after a bunch of 
> crashing and meddling in Master Proc WALs, any attempt at unassign has the 
> procedure fail (see below) and then report the region as STUCK.
> I broke the lock w/ new hbck2 tooling and then tried to offline again but 
> same thing happened. Bug. Fix.
> {code}
> 2018-09-22 18:36:41,900 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch 
> pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, 
> locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558
> 2018-09-22 18:36:41,899 INFO 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: 
> pid=138646, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=0780467efe4c5901887fb12bfa406fa7, 
> server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on 
> 0780467efe4c5901887fb12bfa406fa7
> 2018-09-22 18:36:41,900 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote 
> call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, 
> ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; 
> UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
> vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558
> at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983)
> 2018-09-22 18:36:41,903 WARN 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring 
> vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure 
> table=IntegrationTestBigLinkedList_20180614072614, 
> region=51cdade76ca7217ec191f39e5f56c61c, 
> server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING, 
> location=vd0637.halxg.cloudera.com,22101,1537397969558; 
> exception=NoServerDispatchException
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work

2018-09-24 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625584#comment-16625584
 ] 

Sean Busbey commented on HBASE-20993:
-

bq. Good reminder that we lack a unit test for wire compatibility. I wonder how 
hard it would be to grab the 1.2 shaded client artifact and use it to talk with 
the server code at head of branch.

We could add a nightly test that did this pretty easily. Essentially we could 
just add it as an additional step in [the test that starts up a 1-node cluster 
and runs an example 
program|https://github.com/apache/hbase/blob/master/dev-support/hbase_nightly_pseudo-distributed-test.sh].

> [Auth] IPC client fallback to simple auth allowed doesn't work
> --
>
> Key: HBASE-20993
> URL: https://issues.apache.org/jira/browse/HBASE-20993
> Project: HBase
>  Issue Type: Bug
>  Components: Client, IPC/RPC, security
>Affects Versions: 1.2.6, 1.3.2, 1.2.7, 1.4.7
>Reporter: Reid Chan
>Assignee: Jack Bearden
>Priority: Critical
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-20993.001.patch, 
> HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, 
> HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, 
> HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, 
> HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, 
> HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, 
> HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, 
> HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt
>
>
> It is easily reproducible.
> client's hbase-site.xml: hadoop.security.authentication:kerberos, 
> hbase.security.authentication:kerberos, 
> hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal 
> are right set
> A simple auth hbase cluster, a kerberized hbase client application. 
> application trying to r/w/c/d table will have following exception:
> {code}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738)
>   at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   at 
> 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625557#comment-16625557
 ] 

Hadoop QA commented on HBASE-18451:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
34s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
34s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}118m 19s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}136m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.util.TestHBaseFsck |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 |
| JIRA Issue | HBASE-18451 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941012/HBASE-18451.branch-1.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 324b4fa3dcc5 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 

[jira] [Commented] (HBASE-21193) Retrying Callable doesn't take max retries from current context; uses defaults instead

2018-09-24 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625534#comment-16625534
 ] 

Xu Cang commented on HBASE-21193:
-

 

[~stack]

tried a bit I think it's working for me unless I misunderstand what you mean.

 

I added a logline here for debugging:

[https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerImpl.java#L79]

LOG.info("maxAttemps is : " + maxAttempts);

Then I modified hbas-site.config with as below

37 
38 hbase.client.retries.number
39 *12*
40 

Got this in log 

276913 2018-09-24 01:39:56,012 INFO  
[master/192.168.0.9:16000:becomeActiveMaster] client.RpcRetryingCallerImpl: 
*maxAttemps is : 37*

 

Then change config to this

37 
38 hbase.client.retries.number
 *39 5*
40 

Got maxAttemps changed as below:

279046 2018-09-24 01:41:55,552 INFO  
[master/192.168.0.9:16000:becomeActiveMaster] client.RpcRetryingCallerImpl: 
*maxAttemps is : 16*

> Retrying Callable doesn't take max retries from current context; uses 
> defaults instead
> --
>
> Key: HBASE-21193
> URL: https://issues.apache.org/jira/browse/HBASE-21193
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Major
>
> This makes it hard to change retry count on a read of meta for instance.
> I noticed this when trying to change the defaults for a meta read. I made a 
> customer Connection inside in the master with a new Configuration that had 
> rpc retries and timings upped radically. My reads nonetheless were finishing 
> at the usual retry point (31 tries after 60 seconds or so) because it looked 
> like the Retrying Callable that does the read was taking  max retries from 
> defaults rather than reading the passed in Configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20940) HStore.cansplit should not allow split to happen if it has references

2018-09-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625533#comment-16625533
 ] 

Hudson commented on HBASE-20940:


SUCCESS: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #210 (See 
[https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/210/])
After HBASE-20940 any local index query will open all HFiles of every (larsh: 
rev ed4366063b983270767757d12daf3a8f4b126897)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/iterate/RegionScannerFactory.java


> HStore.cansplit should not allow split to happen if it has references
> -
>
> Key: HBASE-20940
> URL: https://issues.apache.org/jira/browse/HBASE-20940
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 1.4.7, 2.0.3
>
> Attachments: HBASE-20940-branch-1-addendum.patch, 
> HBASE-20940.branch-1.3.v1.patch, HBASE-20940.branch-1.3.v2.patch, 
> HBASE-20940.branch-1.v1.patch, HBASE-20940.branch-1.v2.patch, 
> HBASE-20940.branch-1.v3.patch, HBASE-20940.branch-1.v5.patch, 
> HBASE-20940.v1.patch, HBASE-20940.v2.patch, HBASE-20940.v3.patch, 
> HBASE-20940.v4.patch, result_HBASE-20940.branch-1.v2.log
>
>
> When split happens and immediately another split happens, it may result into 
> a split of a region who still has references to its parent. More details 
> about scenario can be found here HBASE-20933
> HStore.hasReferences should check from fs.storefile rather than in memory 
> objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21212) Wrong flush time when update flush metric

2018-09-24 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625445#comment-16625445
 ] 

Xu Cang commented on HBASE-21212:
-

[~allan163]

I checked branch-1 code, your fix could be perfectly applied to branch-1 too. 
Thanks.

> Wrong flush time when update flush metric
> -
>
> Key: HBASE-21212
> URL: https://issues.apache.org/jira/browse/HBASE-21212
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Minor
> Attachments: HBASE-21212.branch-2.0.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-24 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang reassigned HBASE-18451:
---

Assignee: Xu Cang

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> 

[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-24 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18451:

Attachment: HBASE-18451.branch-1.001.patch
HBASE-18451.master.002.patch
Status: Patch Available  (was: Open)

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-24 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625428#comment-16625428
 ] 

Xu Cang commented on HBASE-18451:
-

rebased [~nihed] 's patch and uploaded for both master and branch-1.

 

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:45:33,362 INFO 
>