[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632810#comment-16632810
 ] 

Hudson commented on HBASE-18451:


Results for branch branch-2.1
[build #392 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of 

[jira] [Commented] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632811#comment-16632811
 ] 

Hudson commented on HBASE-19418:


Results for branch branch-2.1
[build #392 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.
> -
>
> Key: HBASE-19418
> URL: https://issues.apache.org/jira/browse/HBASE-19418
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Ramie Raufdeen
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-19418.master.000.patch
>
>
> When RSs have a LOT of regions and CFs, flushing everything within 5 minutes 
> is not always doable. It might be interesting to be able to increase the 
> RANGE_OF_DELAY. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632814#comment-16632814
 ] 

Hudson commented on HBASE-21196:


Results for branch branch-2.1
[build #392 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> HTableMultiplexer clears the meta cache after every put operation
> -
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 3.0.0, 1.3.3, 2.2.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region 

[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632812#comment-16632812
 ] 

Hudson commented on HBASE-21207:


Results for branch branch-2.1
[build #392 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, HBASE-21207-branch-1.patch, 
> HBASE-21207-branch-1.v1.patch, HBASE-21207-branch-2.v1.patch, 
> HBASE-21207.patch, HBASE-21207.patch, HBASE-21207.v1.patch, 
> edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20857) JMX - add Balancer status = enabled / disabled

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632813#comment-16632813
 ] 

Hudson commented on HBASE-20857:


Results for branch branch-2.1
[build #392 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/392//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> JMX - add Balancer status = enabled / disabled
> --
>
> Key: HBASE-20857
> URL: https://issues.apache.org/jira/browse/HBASE-20857
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, REST, tooling, Usability
>Reporter: Hari Sekhon
>Assignee: Kiran Kumar Maturi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-20857.branch-1.4.001.patch, 
> HBASE-20857.branch-1.4.002.patch
>
>
> Add HBase Balancer enabled/disabled status to JMX API on HMaster.
> Right now the HMaster will give a warning near the top of HMaster UI if 
> balancer is disabled, but scraping this is for monitoring integration is not 
> nice, it should be available in JMX API as there is already a 
> Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21248) Implement exponential backoff when retrying for ModifyPeerProcedure

2018-09-28 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21248:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.1+. Thanks [~zghaobac] for reviewing.

> Implement exponential backoff when retrying for ModifyPeerProcedure
> ---
>
> Key: HBASE-21248
> URL: https://issues.apache.org/jira/browse/HBASE-21248
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21248-v1.patch, HBASE-21248-v2.patch, 
> HBASE-21248.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21248) Implement exponential backoff when retrying for ModifyPeerProcedure

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632789#comment-16632789
 ] 

Hadoop QA commented on HBASE-21248:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
55s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
52s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
12m 10s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}133m 
47s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}184m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21248 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941791/HBASE-21248-v2.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 1c8061c68f26 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ab6ec1f9e4 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14536/testReport/ |
| Max. process+thread count | 5156 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14536/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Implement exponential backoff 

[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632794#comment-16632794
 ] 

Hudson commented on HBASE-21196:


Results for branch branch-2
[build #1317 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> HTableMultiplexer clears the meta cache after every put operation
> -
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 3.0.0, 1.3.3, 2.2.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region locations that 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632790#comment-16632790
 ] 

Hudson commented on HBASE-18451:


Results for branch branch-2
[build #1317 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of 

[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632792#comment-16632792
 ] 

Hudson commented on HBASE-21207:


Results for branch branch-2
[build #1317 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, HBASE-21207-branch-1.patch, 
> HBASE-21207-branch-1.v1.patch, HBASE-21207-branch-2.v1.patch, 
> HBASE-21207.patch, HBASE-21207.patch, HBASE-21207.v1.patch, 
> edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632791#comment-16632791
 ] 

Hudson commented on HBASE-19418:


Results for branch branch-2
[build #1317 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.
> -
>
> Key: HBASE-19418
> URL: https://issues.apache.org/jira/browse/HBASE-19418
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Ramie Raufdeen
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-19418.master.000.patch
>
>
> When RSs have a LOT of regions and CFs, flushing everything within 5 minutes 
> is not always doable. It might be interesting to be able to increase the 
> RANGE_OF_DELAY. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20857) JMX - add Balancer status = enabled / disabled

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632793#comment-16632793
 ] 

Hudson commented on HBASE-20857:


Results for branch branch-2
[build #1317 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1317//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> JMX - add Balancer status = enabled / disabled
> --
>
> Key: HBASE-20857
> URL: https://issues.apache.org/jira/browse/HBASE-20857
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, REST, tooling, Usability
>Reporter: Hari Sekhon
>Assignee: Kiran Kumar Maturi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-20857.branch-1.4.001.patch, 
> HBASE-20857.branch-1.4.002.patch
>
>
> Add HBase Balancer enabled/disabled status to JMX API on HMaster.
> Right now the HMaster will give a warning near the top of HMaster UI if 
> balancer is disabled, but scraping this is for monitoring integration is not 
> nice, it should be available in JMX API as there is already a 
> Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21245) Add exponential backoff when retrying for sync replication related procedures

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632788#comment-16632788
 ] 

Hadoop QA commented on HBASE-21245:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
25s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
16s{color} | {color:red} hbase-server: The patch generated 2 new + 0 unchanged 
- 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
26s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 58s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}122m 
44s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}170m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21245 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941793/HBASE-21245.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 02ace6e83779 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ab6ec1f9e4 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14537/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632786#comment-16632786
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #2 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/2/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/2//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/2//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/2//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632778#comment-16632778
 ] 

Hudson commented on HBASE-21196:


Results for branch branch-2.0
[build #879 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/879/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/879//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/879//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/879//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> HTableMultiplexer clears the meta cache after every put operation
> -
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 3.0.0, 1.3.3, 2.2.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region locations that map" and "Looking 
> up meta region location in 

[jira] [Updated] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-28 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21213:
--
Attachment: HBASE-21213.branch-2.1.010.patch

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, 
> HBASE-21213.branch-2.1.007.patch, HBASE-21213.branch-2.1.008.patch, 
> HBASE-21213.branch-2.1.009.patch, HBASE-21213.branch-2.1.010.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21250) Refactor WALProcedureStore and add more comments for better understanding the implementation

2018-09-28 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632756#comment-16632756
 ] 

Duo Zhang commented on HBASE-21250:
---

I have to say that it is really difficult to understand...

And it seems that there is a bug? In the resetTo method of 
ProcedureStoreTracker, we do not clear the old BitSetNode map before adding the 
new ones, while in resetToProto we call reset first to clear the old map. But 
in resetToProto we do not update the minProcId and maxProcId, it maybe OK as 
this will only be called when restarting and seems we will use another data 
structure to calculate the minProcId and maxProcId, but there is no comment to 
say this...

> Refactor WALProcedureStore and add more comments for better understanding the 
> implementation
> 
>
> Key: HBASE-21250
> URL: https://issues.apache.org/jira/browse/HBASE-21250
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
>
> The implementation is complicated and lack of comments to say how it works.
> {code}
> /**
>  * WAL implementation of the ProcedureStore.
>  * @see ProcedureWALPrettyPrinter for printing content of a single WAL.
>  * @see #main(String[]) to parse a directory of MasterWALProcs.
>  */
> {code}
> I think at least we can move sub classes to separated files to make the class 
> smaller, and add more comments to describe what is going on here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632739#comment-16632739
 ] 

Hadoop QA commented on HBASE-21213:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
45s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
25s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
48s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
45s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
11s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 17m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} hbase-procedure: The patch generated 1 new + 20 
unchanged - 0 fixed = 21 total (was 20) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
17s{color} | {color:red} hbase-server: The patch generated 1 new + 326 
unchanged - 1 fixed = 327 total (was 327) {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m  
7s{color} | {color:red} The patch generated 3 new + 20 unchanged - 2 fixed = 23 
total (was 22) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m  3s{color} | {color:orange} The patch generated 1 new + 41 unchanged - 0 
fixed = 42 total (was 41) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
31s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 42s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
2m 15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
12s{color} | {color:red} hbase-procedure generated 1 new + 0 unchanged - 0 
fixed = 1 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
59s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
56s{color} | {color:green} hbase-procedure 

[jira] [Created] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-09-28 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21254:
-

 Summary: Need to find a way to limit the number of proc wal files
 Key: HBASE-21254
 URL: https://issues.apache.org/jira/browse/HBASE-21254
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Reporter: Duo Zhang
 Fix For: 3.0.0, 2.2.0


For regionserver, we have a max wal file limitation, if we reach the 
limitation, we will trigger a flush on specific regions so that we can delete 
old wal files. But for proc wals, we do not have this mechanism, and it will be 
worse after HBASE-21233, as if there is an old procedure which can not make 
progress and do not persist its state, we need to keep the old proc wal file 
for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21245) Add exponential backoff when retrying for sync replication related procedures

2018-09-28 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21245:
---
Status: Patch Available  (was: Open)

> Add exponential backoff when retrying for sync replication related procedures
> -
>
> Key: HBASE-21245
> URL: https://issues.apache.org/jira/browse/HBASE-21245
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21245.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21245) Add exponential backoff when retrying for sync replication related procedures

2018-09-28 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21245:
---
Attachment: HBASE-21245.master.001.patch

> Add exponential backoff when retrying for sync replication related procedures
> -
>
> Key: HBASE-21245
> URL: https://issues.apache.org/jira/browse/HBASE-21245
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21245.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21248) Implement exponential backoff when retrying for ModifyPeerProcedure

2018-09-28 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21248:
--
Attachment: HBASE-21248-v2.patch

> Implement exponential backoff when retrying for ModifyPeerProcedure
> ---
>
> Key: HBASE-21248
> URL: https://issues.apache.org/jira/browse/HBASE-21248
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21248-v1.patch, HBASE-21248-v2.patch, 
> HBASE-21248.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21248) Implement exponential backoff when retrying for ModifyPeerProcedure

2018-09-28 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632703#comment-16632703
 ] 

Duo Zhang commented on HBASE-21248:
---

Seems I pushed the patch to master by accident... Reverted, let me try the pre 
commit again.

> Implement exponential backoff when retrying for ModifyPeerProcedure
> ---
>
> Key: HBASE-21248
> URL: https://issues.apache.org/jira/browse/HBASE-21248
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21248-v1.patch, HBASE-21248.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632692#comment-16632692
 ] 

stack commented on HBASE-21213:
---

.009 Fixes the TestShell failure and adds an EXPENSIVE recursive call as 
discussed above. Seems like I need it in my tests on this big cluster. Need to 
figure out the why but in meantime, it should let me fixup extremes.

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, 
> HBASE-21213.branch-2.1.007.patch, HBASE-21213.branch-2.1.008.patch, 
> HBASE-21213.branch-2.1.009.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21067) Backport HBASE-17519 (Rollback the removed cells) to branch-1.3

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632690#comment-16632690
 ] 

Hudson commented on HBASE-21067:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #487 (See 
[https://builds.apache.org/job/HBase-1.3-IT/487/])
HBASE-21067 Backport HBASE-17519 (Rollback the removed cells) (apurtell: rev 
171f8f066ec072475ae4454e9b3f5d545cee73a3)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestRollbackFromClient.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultMemStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultMemStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


> Backport HBASE-17519 (Rollback the removed cells) to branch-1.3
> ---
>
> Key: HBASE-21067
> URL: https://issues.apache.org/jira/browse/HBASE-21067
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.3
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: 1.3.3
>
> Attachments: HBASE-21067.branch-1.3.001.patch, 
> HBASE-21067.branch-1.3.002.patch
>
>
> Backport HBASE-17519 (Rollback the removed cells) to branch-1.3, which 
> handles rollback of append/increment completely in case of failure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17519) Rollback the removed cells

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632691#comment-16632691
 ] 

Hudson commented on HBASE-17519:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #487 (See 
[https://builds.apache.org/job/HBase-1.3-IT/487/])
HBASE-21067 Backport HBASE-17519 (Rollback the removed cells) (apurtell: rev 
171f8f066ec072475ae4454e9b3f5d545cee73a3)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestRollbackFromClient.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultMemStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultMemStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


> Rollback the removed cells
> --
>
> Key: HBASE-17519
> URL: https://issues.apache.org/jira/browse/HBASE-17519
> Project: HBase
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 1.4.0
>
> Attachments: HBASE-17519.branch-1.v0.patch, 
> HBASE-17519.branch-1.v1.patch, HBASE-17519.branch-1.v1.patch, 
> HBASE-17519.branch-1.v2.patch, HBASE-17519.branch-1.v2.patch, 
> HBASE-17519.branch-1.v2.patch
>
>
> The Store#upsert removes the old cells but we don’t rollback the removed 
> cells when failing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21067) Backport HBASE-17519 (Rollback the removed cells) to branch-1.3

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21067:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Memstore, store, and append/increment tests pass. As did the test flagged in 
the precommit. Committing to branch-1.3. 

> Backport HBASE-17519 (Rollback the removed cells) to branch-1.3
> ---
>
> Key: HBASE-21067
> URL: https://issues.apache.org/jira/browse/HBASE-21067
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.3
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: 1.3.3
>
> Attachments: HBASE-21067.branch-1.3.001.patch, 
> HBASE-21067.branch-1.3.002.patch
>
>
> Backport HBASE-17519 (Rollback the removed cells) to branch-1.3, which 
> handles rollback of append/increment completely in case of failure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21194) Add TestCopyTable which exercises MOB feature

2018-09-28 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21194:
---
Labels: mob  (was: )

> Add TestCopyTable which exercises MOB feature
> -
>
> Key: HBASE-21194
> URL: https://issues.apache.org/jira/browse/HBASE-21194
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>  Labels: mob
>
> Currently TestCopyTable doesn't cover table(s) with MOB feature enabled.
> We should add variant that enables MOB on the table being copied and verify 
> that MOB content is copied correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632670#comment-16632670
 ] 

Hadoop QA commented on HBASE-21247:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
30s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
38s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
13m 39s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}195m 
31s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}251m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21247 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941726/21247.v4.txt |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux f3d0aa6d488f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 6bc7089f9e |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14534/testReport/ |
| Max. process+thread count | 4275 (vs. ulimit of 1) |
| 

[jira] [Commented] (HBASE-21067) Backport HBASE-17519 (Rollback the removed cells) to branch-1.3

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632663#comment-16632663
 ] 

Andrew Purtell commented on HBASE-21067:


Let me try to commit this after some local checks. 

> Backport HBASE-17519 (Rollback the removed cells) to branch-1.3
> ---
>
> Key: HBASE-21067
> URL: https://issues.apache.org/jira/browse/HBASE-21067
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.3
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: 1.3.3
>
> Attachments: HBASE-21067.branch-1.3.001.patch, 
> HBASE-21067.branch-1.3.002.patch
>
>
> Backport HBASE-17519 (Rollback the removed cells) to branch-1.3, which 
> handles rollback of append/increment completely in case of failure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632662#comment-16632662
 ] 

Andrew Purtell commented on HBASE-21196:


Sure thing Ted!

> HTableMultiplexer clears the meta cache after every put operation
> -
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 3.0.0, 1.3.3, 2.2.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region locations that map" and "Looking 
> up meta region location in ZK" are present for every put.
> *Analysis:*
>  The problem occurs as we call the {{cleanServerCache}} method always clears 
> the server cache in case tablename is null and exception is null. See 
> [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918]
> {code:java}
> private void cleanServerCache(ServerName server, Throwable regionException) {
> if (tableName == null && 
> ClientExceptionsUtil.isMetaClearingException(regionException)) {
>   // For multi-actions, we don't have a table name, but we want to make 
> sure to clear the
>   // cache in case there were location-related exceptions. We don't to 
> clear the cache
>   // for every possible exception that comes 

[jira] [Updated] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21196:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.3
   2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master and all of the branch-2s

> HTableMultiplexer clears the meta cache after every put operation
> -
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 3.0.0, 1.3.3, 2.2.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region locations that map" and "Looking 
> up meta region location in ZK" are present for every put.
> *Analysis:*
>  The problem occurs as we call the {{cleanServerCache}} method always clears 
> the server cache in case tablename is null and exception is null. See 
> [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918]
> {code:java}
> private void cleanServerCache(ServerName server, Throwable regionException) {
> if (tableName == null && 
> ClientExceptionsUtil.isMetaClearingException(regionException)) {
>   // For multi-actions, we don't have a table name, but we want to make 
> 

[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation

2018-09-28 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632658#comment-16632658
 ] 

Ted Yu commented on HBASE-21196:


Thanks for the commit, Andrew.

> HTableMultiplexer clears the meta cache after every put operation
> -
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 3.0.0, 1.3.3, 2.2.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region locations that map" and "Looking 
> up meta region location in ZK" are present for every put.
> *Analysis:*
>  The problem occurs as we call the {{cleanServerCache}} method always clears 
> the server cache in case tablename is null and exception is null. See 
> [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918]
> {code:java}
> private void cleanServerCache(ServerName server, Throwable regionException) {
> if (tableName == null && 
> ClientExceptionsUtil.isMetaClearingException(regionException)) {
>   // For multi-actions, we don't have a table name, but we want to make 
> sure to clear the
>   // cache in case there were location-related exceptions. We don't to 
> clear the cache
>   // for every possible exception that comes through, however.
>   

[jira] [Commented] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632650#comment-16632650
 ] 

Hudson commented on HBASE-21249:


Results for branch branch-2.0
[build #878 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/878/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/878//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/878//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/878//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Add jitter for ProcedureUtil.getBackoffTimeMs
> -
>
> Key: HBASE-21249
> URL: https://issues.apache.org/jira/browse/HBASE-21249
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21249.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21196) HTableMultiplexer clears the meta cache after every put operation

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632647#comment-16632647
 ] 

Andrew Purtell commented on HBASE-21196:


Hmm, this looks like it was dropped on the floor. Let me try to commit. 

> HTableMultiplexer clears the meta cache after every put operation
> -
>
> Key: HBASE-21196
> URL: https://issues.apache.org/jira/browse/HBASE-21196
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 3.0.0, 1.3.3, 2.2.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 6 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region locations that map" and "Looking 
> up meta region location in ZK" are present for every put.
> *Analysis:*
>  The problem occurs as we call the {{cleanServerCache}} method always clears 
> the server cache in case tablename is null and exception is null. See 
> [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918]
> {code:java}
> private void cleanServerCache(ServerName server, Throwable regionException) {
> if (tableName == null && 
> ClientExceptionsUtil.isMetaClearingException(regionException)) {
>   // For multi-actions, we don't have a table name, but we want to make 
> sure to clear the
>   // cache in case there were location-related exceptions. We don't to 
> clear the cache
>   // for every 

[jira] [Commented] (HBASE-21033) Separate the StoreHeap from StoreFileHeap

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632645#comment-16632645
 ] 

Andrew Purtell commented on HBASE-21033:


What does master / branch-2 changes look like? The same thing, basically? 

> Separate the StoreHeap from StoreFileHeap
> -
>
> Key: HBASE-21033
> URL: https://issues.apache.org/jira/browse/HBASE-21033
> Project: HBase
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Attachments: 21033-branch-1-minimal.txt, 21033-branch-1-v0.txt
>
>
> Currently KeyValueHeap is used for both, heaps of StoreScanners at the Region 
> level as well as heaps of StoreFileScanners (and a MemstoreScanner) at the 
> Store level.
> This is various problems:
>  # Some incorrect method usage can only be deduced at runtime via runtime 
> exception.
>  # In profiling sessions it's hard to distinguish the two.
>  # It's just not clean :)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20857) JMX - add Balancer status = enabled / disabled

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20857:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.1.1
   1.4.8
   2.2.0
   1.5.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, branch-2, branch-2.1, branch-1, and branch-1.4. Thanks for 
the contribution [~kiran.maturi] 

> JMX - add Balancer status = enabled / disabled
> --
>
> Key: HBASE-20857
> URL: https://issues.apache.org/jira/browse/HBASE-20857
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, REST, tooling, Usability
>Reporter: Hari Sekhon
>Assignee: Kiran Kumar Maturi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-20857.branch-1.4.001.patch, 
> HBASE-20857.branch-1.4.002.patch
>
>
> Add HBase Balancer enabled/disabled status to JMX API on HMaster.
> Right now the HMaster will give a warning near the top of HMaster UI if 
> balancer is disabled, but scraping this is for monitoring integration is not 
> nice, it should be available in JMX API as there is already a 
> Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632633#comment-16632633
 ] 

Hadoop QA commented on HBASE-21242:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
53s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
14s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 8s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
14s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
30s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
55s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
2s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}124m  
4s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21242 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941729/HBASE-21242.branch-2.1.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9dc1403a259f 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build 

[jira] [Commented] (HBASE-20857) JMX - add Balancer status = enabled / disabled

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632628#comment-16632628
 ] 

Andrew Purtell commented on HBASE-20857:


Hmm, looks easy, proceeding

> JMX - add Balancer status = enabled / disabled
> --
>
> Key: HBASE-20857
> URL: https://issues.apache.org/jira/browse/HBASE-20857
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, REST, tooling, Usability
>Reporter: Hari Sekhon
>Assignee: Kiran Kumar Maturi
>Priority: Major
> Attachments: HBASE-20857.branch-1.4.001.patch, 
> HBASE-20857.branch-1.4.002.patch
>
>
> Add HBase Balancer enabled/disabled status to JMX API on HMaster.
> Right now the HMaster will give a warning near the top of HMaster UI if 
> balancer is disabled, but scraping this is for monitoring integration is not 
> nice, it should be available in JMX API as there is already a 
> Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20857) JMX - add Balancer status = enabled / disabled

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632626#comment-16632626
 ] 

Andrew Purtell commented on HBASE-20857:


This looks fine to me, and will apply to branch-1.4 and branch-1, but what 
about more recent versions? Let me see how hard it is to apply to branch-2 and 
up. 

> JMX - add Balancer status = enabled / disabled
> --
>
> Key: HBASE-20857
> URL: https://issues.apache.org/jira/browse/HBASE-20857
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, REST, tooling, Usability
>Reporter: Hari Sekhon
>Assignee: Kiran Kumar Maturi
>Priority: Major
> Attachments: HBASE-20857.branch-1.4.001.patch, 
> HBASE-20857.branch-1.4.002.patch
>
>
> Add HBase Balancer enabled/disabled status to JMX API on HMaster.
> Right now the HMaster will give a warning near the top of HMaster UI if 
> balancer is disabled, but scraping this is for monitoring integration is not 
> nice, it should be available in JMX API as there is already a 
> Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632622#comment-16632622
 ] 

Hudson commented on HBASE-21249:


Results for branch branch-2.1
[build #391 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/391/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/391//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/391//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/391//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add jitter for ProcedureUtil.getBackoffTimeMs
> -
>
> Key: HBASE-21249
> URL: https://issues.apache.org/jira/browse/HBASE-21249
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21249.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21244) Skip persistence when retrying for assignment related procedures

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632612#comment-16632612
 ] 

Hudson commented on HBASE-21244:


Results for branch branch-2
[build #1316 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Skip persistence when retrying for assignment related procedures
> 
>
> Key: HBASE-21244
> URL: https://issues.apache.org/jira/browse/HBASE-21244
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21244.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632613#comment-16632613
 ] 

Hudson commented on HBASE-21249:


Results for branch branch-2
[build #1316 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1316//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add jitter for ProcedureUtil.getBackoffTimeMs
> -
>
> Key: HBASE-21249
> URL: https://issues.apache.org/jira/browse/HBASE-21249
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21249.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21207:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.8
   2.2.0
   1.5.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master, branch-2, branch-2.1, branch-1, and branch-1.4

> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, HBASE-21207-branch-1.patch, 
> HBASE-21207-branch-1.v1.patch, HBASE-21207-branch-2.v1.patch, 
> HBASE-21207.patch, HBASE-21207.patch, HBASE-21207.v1.patch, 
> edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632602#comment-16632602
 ] 

Andrew Purtell commented on HBASE-21117:


Applying this to branch-1 and testing afterward with {{mvn clean install 
-Dtest=TestRSGroups}}, I get the below failure. If I go back one revision and 
history to current head of branch and test again, all units pass. 

[ERROR] Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
227.777 s <<< FAILURE! - in org.apache.hadoop.hbase.rsgroup.TestRSGroups
[ERROR] testNamespaceConstraint(org.apache.hadoop.hbase.rsgroup.TestRSGroups)  
Time elapsed: 2.404 s  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.hadoop.hbase.rsgroup.TestRSGroups.testNamespaceConstraint(TestRSGroups.java:254)

Same with branch-1.4 

> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21117:
---
Status: Open  (was: Patch Available)

> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632600#comment-16632600
 ] 

Hudson commented on HBASE-19418:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #486 (See 
[https://builds.apache.org/job/HBase-1.3-IT/486/])
HBASE-19418 configurable range of delay in PeriodicMemstoreFlusher (apurtell: 
rev f91a912474551d136cfe8328b572a55b9fcdba3b)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


> RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.
> -
>
> Key: HBASE-19418
> URL: https://issues.apache.org/jira/browse/HBASE-19418
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Ramie Raufdeen
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-19418.master.000.patch
>
>
> When RSs have a LOT of regions and CFs, flushing everything within 5 minutes 
> is not always doable. It might be interesting to be able to increase the 
> RANGE_OF_DELAY. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19418:
---
Priority: Minor  (was: Major)

> RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.
> -
>
> Key: HBASE-19418
> URL: https://issues.apache.org/jira/browse/HBASE-19418
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Ramie Raufdeen
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-19418.master.000.patch
>
>
> When RSs have a LOT of regions and CFs, flushing everything within 5 minutes 
> is not always doable. It might be interesting to be able to increase the 
> RANGE_OF_DELAY. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-28 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21213:
--
Attachment: HBASE-21213.branch-2.1.009.patch

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, 
> HBASE-21213.branch-2.1.007.patch, HBASE-21213.branch-2.1.008.patch, 
> HBASE-21213.branch-2.1.009.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-19418.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.1.1
   1.4.8
   2.2.0
   1.3.3
   1.5.0
   3.0.0

Pushed up, thanks for the contribution [~ramatronics]

> RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.
> -
>
> Key: HBASE-19418
> URL: https://issues.apache.org/jira/browse/HBASE-19418
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Ramie Raufdeen
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-19418.master.000.patch
>
>
> When RSs have a LOT of regions and CFs, flushing everything within 5 minutes 
> is not always doable. It might be interesting to be able to increase the 
> RANGE_OF_DELAY. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632574#comment-16632574
 ] 

Hadoop QA commented on HBASE-21213:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
59s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m  
7s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
33s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
29s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
50s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 17m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
19s{color} | {color:red} hbase-server: The patch generated 1 new + 326 
unchanged - 1 fixed = 327 total (was 327) {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m  
7s{color} | {color:red} The patch generated 3 new + 20 unchanged - 2 fixed = 23 
total (was 22) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m  2s{color} | {color:orange} The patch generated 1 new + 41 unchanged - 0 
fixed = 42 total (was 41) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
39s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
2m 10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
1s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
46s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}118m 58s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632532#comment-16632532
 ] 

Hadoop QA commented on HBASE-18451:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
28s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
24s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 52s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}124m 
32s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}167m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-18451 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941715/HBASE-18451.master.004.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 272ce9242570 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 6bc7089f9e |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14531/testReport/ |
| Max. process+thread count | 5240 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14531/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> 

[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18451:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.1.1
   1.4.8
   2.2.0
   1.5.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632526#comment-16632526
 ] 

Andrew Purtell commented on HBASE-18451:


Done, pushed to master, branch-2, branch-2.1, branch-1, branch-1.4. 

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> 

[jira] [Commented] (HBASE-21216) TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky

2018-09-28 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632510#comment-16632510
 ] 

Ted Yu commented on HBASE-21216:


Looped the test 30 times locally with patch which all passed.

> TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky
> --
>
> Key: HBASE-21216
> URL: https://issues.apache.org/jira/browse/HBASE-21216
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21216.v1.txt
>
>
> From 
> https://builds.apache.org/job/HBase-Flaky-Tests/job/branch-2/794/testReport/junit/org.apache.hadoop.hbase.master.cleaner/TestSnapshotFromMaster/testSnapshotHFileArchiving/
>  :
> {code}
> java.lang.AssertionError: Archived hfiles [] and table hfiles 
> [9ca09392705f425f9c916beedc10d63c] is missing snapshot 
> file:6739a09747e54189a4112a6d8f37e894
>   at 
> org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster.testSnapshotHFileArchiving(TestSnapshotFromMaster.java:370)
> {code}
> The file appeared in archive dir before hfile cleaners were run:
> {code}
> 2018-09-20 10:38:53,187 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-archive/
> 2018-09-20 10:38:53,188 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |data/
> 2018-09-20 10:38:53,189 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |---default/
> 2018-09-20 10:38:53,190 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |--test/
> 2018-09-20 10:38:53,191 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-1237d57b63a7bdf067a930441a02514a/
> 2018-09-20 10:38:53,192 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |recovered.edits/
> 2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---4.seqid
> 2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-29e1700e09b51223ad2f5811105a4d51/
> 2018-09-20 10:38:53,194 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |fam/
> 2018-09-20 10:38:53,195 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---2c66a18f6c1a4074b84ffbb3245268c4
> 2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---45bb396c6a5e49629e45a4d56f1e9b14
> 2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---6739a09747e54189a4112a6d8f37e894
> {code}
> However, the archive dir became empty after hfile cleaners were run:
> {code}
> 2018-09-20 10:38:53,312 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-archive/
> 2018-09-20 10:38:53,313 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-corrupt/
> {code}
> Leading to the assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21216) TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky

2018-09-28 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-21216:
--

Assignee: Ted Yu

> TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky
> --
>
> Key: HBASE-21216
> URL: https://issues.apache.org/jira/browse/HBASE-21216
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21216.v1.txt
>
>
> From 
> https://builds.apache.org/job/HBase-Flaky-Tests/job/branch-2/794/testReport/junit/org.apache.hadoop.hbase.master.cleaner/TestSnapshotFromMaster/testSnapshotHFileArchiving/
>  :
> {code}
> java.lang.AssertionError: Archived hfiles [] and table hfiles 
> [9ca09392705f425f9c916beedc10d63c] is missing snapshot 
> file:6739a09747e54189a4112a6d8f37e894
>   at 
> org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster.testSnapshotHFileArchiving(TestSnapshotFromMaster.java:370)
> {code}
> The file appeared in archive dir before hfile cleaners were run:
> {code}
> 2018-09-20 10:38:53,187 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-archive/
> 2018-09-20 10:38:53,188 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |data/
> 2018-09-20 10:38:53,189 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |---default/
> 2018-09-20 10:38:53,190 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |--test/
> 2018-09-20 10:38:53,191 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-1237d57b63a7bdf067a930441a02514a/
> 2018-09-20 10:38:53,192 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |recovered.edits/
> 2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---4.seqid
> 2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-29e1700e09b51223ad2f5811105a4d51/
> 2018-09-20 10:38:53,194 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |fam/
> 2018-09-20 10:38:53,195 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---2c66a18f6c1a4074b84ffbb3245268c4
> 2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---45bb396c6a5e49629e45a4d56f1e9b14
> 2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): 
> |---6739a09747e54189a4112a6d8f37e894
> {code}
> However, the archive dir became empty after hfile cleaners were run:
> {code}
> 2018-09-20 10:38:53,312 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-archive/
> 2018-09-20 10:38:53,313 DEBUG [Time-limited test] util.CommonFSUtils(771): 
> |-corrupt/
> {code}
> Leading to the assertion failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-09-28 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21242:
--
Attachment: HBASE-21242.branch-2.1.002.patch

> [amv2] Miscellaneous minor log and assign procedure create improvements
> ---
>
> Key: HBASE-21242
> URL: https://issues.apache.org/jira/browse/HBASE-21242
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, Operability
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.001.patch, HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.002.patch
>
>
> Some minor fixups:
> {code}
> For RIT Duration, do better than print ms/seconds. Remove redundant UI
> column dedicated to duration when we log it in the status field too.
> Make bypass log at INFO level -- when DEBUG we can miss important
> fixup detail like why we failed.
> Make it so on complete of subprocedure, we note count of outstanding
> siblings so we have a clue how much further the parent has to go before
> it is done (Helpful when hundreds of servers doing SCP).
> Have the SCP run the AP preflight check before creating an AP; saves
> creation of hundreds of thousands of APs during fixup of this big cluster
> of mine.
> Don't log tablename three times when reporting remote call failed.
> If lock is held already, note who has it. Also log after we get lock
> or if we have to wait rather than log on entrance though we may
> later have to wait (or we may have just picked up the lock).
> {code}
> Posting patch in a sec but let me try it on cluster too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-09-28 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21242:
--
Attachment: HBASE-21242.branch-2.1.001.patch

> [amv2] Miscellaneous minor log and assign procedure create improvements
> ---
>
> Key: HBASE-21242
> URL: https://issues.apache.org/jira/browse/HBASE-21242
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, Operability
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.001.patch, HBASE-21242.branch-2.1.001.patch
>
>
> Some minor fixups:
> {code}
> For RIT Duration, do better than print ms/seconds. Remove redundant UI
> column dedicated to duration when we log it in the status field too.
> Make bypass log at INFO level -- when DEBUG we can miss important
> fixup detail like why we failed.
> Make it so on complete of subprocedure, we note count of outstanding
> siblings so we have a clue how much further the parent has to go before
> it is done (Helpful when hundreds of servers doing SCP).
> Have the SCP run the AP preflight check before creating an AP; saves
> creation of hundreds of thousands of APs during fixup of this big cluster
> of mine.
> Don't log tablename three times when reporting remote call failed.
> If lock is held already, note who has it. Also log after we get lock
> or if we have to wait rather than log on entrance though we may
> later have to wait (or we may have just picked up the lock).
> {code}
> Posting patch in a sec but let me try it on cluster too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632260#comment-16632260
 ] 

stack commented on HBASE-21242:
---

On test failures, they seem unrelated and pass locally. Will retry.

> [amv2] Miscellaneous minor log and assign procedure create improvements
> ---
>
> Key: HBASE-21242
> URL: https://issues.apache.org/jira/browse/HBASE-21242
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, Operability
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.001.patch
>
>
> Some minor fixups:
> {code}
> For RIT Duration, do better than print ms/seconds. Remove redundant UI
> column dedicated to duration when we log it in the status field too.
> Make bypass log at INFO level -- when DEBUG we can miss important
> fixup detail like why we failed.
> Make it so on complete of subprocedure, we note count of outstanding
> siblings so we have a clue how much further the parent has to go before
> it is done (Helpful when hundreds of servers doing SCP).
> Have the SCP run the AP preflight check before creating an AP; saves
> creation of hundreds of thousands of APs during fixup of this big cluster
> of mine.
> Don't log tablename three times when reporting remote call failed.
> If lock is held already, note who has it. Also log after we get lock
> or if we have to wait rather than log on entrance though we may
> later have to wait (or we may have just picked up the lock).
> {code}
> Posting patch in a sec but let me try it on cluster too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632258#comment-16632258
 ] 

stack commented on HBASE-21242:
---

Thanks for taking a look [~mdrob]. 

bq. Do we want to include the proc id in any of the new logging?

It is there, no? A proc toString includes procId. Or perhaps you have a 
particular line in mind.

> [amv2] Miscellaneous minor log and assign procedure create improvements
> ---
>
> Key: HBASE-21242
> URL: https://issues.apache.org/jira/browse/HBASE-21242
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, Operability
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.001.patch
>
>
> Some minor fixups:
> {code}
> For RIT Duration, do better than print ms/seconds. Remove redundant UI
> column dedicated to duration when we log it in the status field too.
> Make bypass log at INFO level -- when DEBUG we can miss important
> fixup detail like why we failed.
> Make it so on complete of subprocedure, we note count of outstanding
> siblings so we have a clue how much further the parent has to go before
> it is done (Helpful when hundreds of servers doing SCP).
> Have the SCP run the AP preflight check before creating an AP; saves
> creation of hundreds of thousands of APs during fixup of this big cluster
> of mine.
> Don't log tablename three times when reporting remote call failed.
> If lock is held already, note who has it. Also log after we get lock
> or if we have to wait rather than log on entrance though we may
> later have to wait (or we may have just picked up the lock).
> {code}
> Posting patch in a sec but let me try it on cluster too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers

2018-09-28 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21247:
---
Attachment: 21247.v4.txt

> Allow WAL Provider to be specified by configuration without explicit enum in 
> Providers
> --
>
> Key: HBASE-21247
> URL: https://issues.apache.org/jira/browse/HBASE-21247
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 21247.v1.txt, 21247.v2.txt, 21247.v3.txt, 21247.v4.txt
>
>
> Currently all the WAL Providers acceptable to hbase are specified in 
> Providers enum of WALFactory.
> This restricts the ability for additional WAL Providers to be supplied - by 
> class name.
> This issue introduces additional config which allows the specification of new 
> WAL Provider through class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632256#comment-16632256
 ] 

Andrew Purtell commented on HBASE-19418:


lgtm
That test failure is not related. 
Let me see about committing this today.

> RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.
> -
>
> Key: HBASE-19418
> URL: https://issues.apache.org/jira/browse/HBASE-19418
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Ramie Raufdeen
>Priority: Major
> Attachments: HBASE-19418.master.000.patch
>
>
> When RSs have a LOT of regions and CFs, flushing everything within 5 minutes 
> is not always doable. It might be interesting to be able to increase the 
> RANGE_OF_DELAY. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632254#comment-16632254
 ] 

Andrew Purtell commented on HBASE-21117:


lgtm
Let me try to commit this today.


> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632249#comment-16632249
 ] 

Andrew Purtell commented on HBASE-18451:


Doing a few local checks now

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of 

[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632240#comment-16632240
 ] 

Andrew Purtell commented on HBASE-21207:


Thanks [~archana.katiyar]

> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, HBASE-21207-branch-1.patch, 
> HBASE-21207-branch-1.v1.patch, HBASE-21207-branch-2.v1.patch, 
> HBASE-21207.patch, HBASE-21207.patch, HBASE-21207.v1.patch, 
> edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632239#comment-16632239
 ] 

Andrew Purtell commented on HBASE-18451:


Ok, I'll do the commit now unless someone beats me to it.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 

[jira] [Updated] (HBASE-21231) Add documentation for MajorCompactor

2018-09-28 Thread Mike Drob (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-21231:
--
Status: Open  (was: Patch Available)

moving out of patch available pending feedback on RB

> Add documentation for MajorCompactor
> 
>
> Key: HBASE-21231
> URL: https://issues.apache.org/jira/browse/HBASE-21231
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21231.master.001.patch
>
>
> HBASE-19528 added a new MajorCompactor tool, but it lacks of documentation. 
> Let's document it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-09-28 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632232#comment-16632232
 ] 

Mike Drob commented on HBASE-21242:
---

Do we want to include the proc id in any of the new logging?

> [amv2] Miscellaneous minor log and assign procedure create improvements
> ---
>
> Key: HBASE-21242
> URL: https://issues.apache.org/jira/browse/HBASE-21242
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, Operability
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.001.patch
>
>
> Some minor fixups:
> {code}
> For RIT Duration, do better than print ms/seconds. Remove redundant UI
> column dedicated to duration when we log it in the status field too.
> Make bypass log at INFO level -- when DEBUG we can miss important
> fixup detail like why we failed.
> Make it so on complete of subprocedure, we note count of outstanding
> siblings so we have a clue how much further the parent has to go before
> it is done (Helpful when hundreds of servers doing SCP).
> Have the SCP run the AP preflight check before creating an AP; saves
> creation of hundreds of thousands of APs during fixup of this big cluster
> of mine.
> Don't log tablename three times when reporting remote call failed.
> If lock is held already, note who has it. Also log after we get lock
> or if we have to wait rather than log on entrance though we may
> later have to wait (or we may have just picked up the lock).
> {code}
> Posting patch in a sec but let me try it on cluster too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632225#comment-16632225
 ] 

stack commented on HBASE-20952:
---

I find that I am responding to Ted Yu up on the doc. I've stopped commenting. I 
do not, by choice, want to work with him. Generally, I only review his work 
when it looks like something of his is about to be committed. IMO, he does more 
damage than good in the project and so for the good of the project, I review 
his work when I can. Thanks.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21225) Having RPC & Space quota on a table/Namespace doesn't allow space quota to be removed using 'NONE'

2018-09-28 Thread Sakthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi updated HBASE-21225:
---
Description: 
A part of HBASE-20705 is still unresolved. In that Jira it was assumed that 
problem is: when table having both rpc & space quotas is dropped (with 
hbase.quota.remove.on.table.delete set as true), the rpc quota is not set to be 
dropped along with table-drops, and space quota was not being able to be 
removed completely because of the "EMPTY" row that rpc quota left even after 
removing. 

The proposed solution for that was to make sure that rpc quota didn't leave 
empty rows after removal of quota. And setting automatic removal of rpc quota 
with table drops. That made sure that space quotas can be recreated/removed.

But all this was under the assumption that hbase.quota.remove.on.table.delete 
is set as true. When it is set as false, the same issue can reproduced. Also 
the below shown steps can used to reproduce the issue without table-drops.
{noformat}
hbase(main):005:0> create 't2','cf'
Created table t2
Took 0.7619 seconds
=> Hbase::Table - t2
hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => '10M/sec'
Took 0.0514 seconds
hbase(main):007:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0162 seconds
hbase(main):008:0> list_quotas
OWNER  QUOTAS
 TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, 
LIMIT => 10M/sec, SCOPE =>
   MACHINE
 TABLE => t2   TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, 
VIOLATION_POLICY => NO_WRIT
   ES
2 row(s)
Took 0.0716 seconds
hbase(main):009:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => NONE
Took 0.0082 seconds
hbase(main):010:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0254 seconds
hbase(main):011:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0082 seconds
hbase(main):012:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0411 seconds
{noformat}

  was:
A part of HBASE-20705 is still unresolved. In that Jira it was assumed that 
problem is: when table having both rpc & space quotas is dropped (with 
hbase.delete.on.table.remove set as true), the quotas are still present in the 
quotas table. 
{noformat}
hbase(main):005:0> create 't2','cf'
Created table t2
Took 0.7619 seconds
=> Hbase::Table - t2
hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => '10M/sec'
Took 0.0514 seconds
hbase(main):007:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0162 seconds
hbase(main):008:0> list_quotas
OWNER  QUOTAS
 TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, 
LIMIT => 10M/sec, SCOPE =>
   MACHINE
 TABLE => t2   TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, 
VIOLATION_POLICY => NO_WRIT
   ES
2 row(s)
Took 0.0716 seconds
hbase(main):009:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => NONE
Took 0.0082 seconds
hbase(main):010:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0254 seconds
hbase(main):011:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0082 seconds
hbase(main):012:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0411 seconds
{noformat}


> Having RPC & Space quota on a table/Namespace doesn't allow space quota to be 
> removed using 'NONE'
> --
>
> Key: HBASE-21225
> URL: https://issues.apache.org/jira/browse/HBASE-21225
> Project: HBase
>  Issue Type: Bug
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21225.master.001.patch
>
>
> A part of HBASE-20705 is still unresolved. In that Jira it was assumed that 
> problem is: when table having both rpc & space quotas is dropped (with 
> 

[jira] [Updated] (HBASE-21225) Having RPC & Space quota on a table/Namespace doesn't allow space quota to be removed using 'NONE'

2018-09-28 Thread Sakthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi updated HBASE-21225:
---
Description: 
A part of HBASE-20705 is still unresolved. In that Jira it was assumed that 
problem is: when table having both rpc & space quotas is dropped (with 
hbase.delete.on.table.remove set as true), the quotas are still present in the 
quotas table. 
{noformat}
hbase(main):005:0> create 't2','cf'
Created table t2
Took 0.7619 seconds
=> Hbase::Table - t2
hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => '10M/sec'
Took 0.0514 seconds
hbase(main):007:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0162 seconds
hbase(main):008:0> list_quotas
OWNER  QUOTAS
 TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, 
LIMIT => 10M/sec, SCOPE =>
   MACHINE
 TABLE => t2   TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, 
VIOLATION_POLICY => NO_WRIT
   ES
2 row(s)
Took 0.0716 seconds
hbase(main):009:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => NONE
Took 0.0082 seconds
hbase(main):010:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0254 seconds
hbase(main):011:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0082 seconds
hbase(main):012:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0411 seconds
{noformat}

  was:
A part of HBASE-20705 is still unresolved
{noformat}
hbase(main):005:0> create 't2','cf'
Created table t2
Took 0.7619 seconds
=> Hbase::Table - t2
hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => '10M/sec'
Took 0.0514 seconds
hbase(main):007:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0162 seconds
hbase(main):008:0> list_quotas
OWNER  QUOTAS
 TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, 
LIMIT => 10M/sec, SCOPE =>
   MACHINE
 TABLE => t2   TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, 
VIOLATION_POLICY => NO_WRIT
   ES
2 row(s)
Took 0.0716 seconds
hbase(main):009:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => NONE
Took 0.0082 seconds
hbase(main):010:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0254 seconds
hbase(main):011:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
POLICY => NO_WRITES
Took 0.0082 seconds
hbase(main):012:0> list_quotas
OWNER   QUOTAS
 TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
 TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
2 row(s)
Took 0.0411 seconds
{noformat}


> Having RPC & Space quota on a table/Namespace doesn't allow space quota to be 
> removed using 'NONE'
> --
>
> Key: HBASE-21225
> URL: https://issues.apache.org/jira/browse/HBASE-21225
> Project: HBase
>  Issue Type: Bug
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21225.master.001.patch
>
>
> A part of HBASE-20705 is still unresolved. In that Jira it was assumed that 
> problem is: when table having both rpc & space quotas is dropped (with 
> hbase.delete.on.table.remove set as true), the quotas are still present in 
> the quotas table. 
> {noformat}
> hbase(main):005:0> create 't2','cf'
> Created table t2
> Took 0.7619 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.0514 seconds
> hbase(main):007:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
> POLICY => NO_WRITES
> Took 0.0162 seconds
> hbase(main):008:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, 
> LIMIT => 10M/sec, SCOPE =>
>MACHINE
>  TABLE => t2   TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, 
> VIOLATION_POLICY => NO_WRIT
>ES
> 2 row(s)
> Took 0.0716 seconds
> 

[jira] [Commented] (HBASE-21248) Implement exponential backoff when retrying for ModifyPeerProcedure

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632197#comment-16632197
 ] 

Hadoop QA commented on HBASE-21248:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
50s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
3s{color} | {color:red} hbase-server: The patch generated 1 new + 0 unchanged - 
0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
57s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 57s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}287m  4s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}329m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestMobRestoreSnapshotFromClient |
|   | hadoop.hbase.client.TestAsyncTableAdminApi |
|   | hadoop.hbase.util.TestFromClientSide3WoUnsafe |
|   | hadoop.hbase.replication.TestReplicationSmallTests |
|   | hadoop.hbase.client.TestFromClientSideWithCoprocessor |
|   | hadoop.hbase.namespace.TestNamespaceAuditor |
|   | hadoop.hbase.regionserver.TestRegionReplicaFailover |
|   | hadoop.hbase.client.TestFromClientSide3 |
|   | hadoop.hbase.replication.TestReplicationKillSlaveRS |
|   | hadoop.hbase.replication.TestReplicationSmallTestsSync |
|   | hadoop.hbase.replication.TestMasterReplication |
|   | hadoop.hbase.client.replication.TestReplicationAdminWithClusters |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21248 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941680/HBASE-21248-v1.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux aa1d382729e6 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632195#comment-16632195
 ] 

Mike Drob commented on HBASE-18451:
---

For branch-1 patch, the javac warnings are showing up because the indent level 
changed so the line number filtering doesn't work as expected. Fine to ignore 
that for now as a false positive.

+1 for both

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 

[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632190#comment-16632190
 ] 

stack commented on HBASE-21213:
---

.008 Fix checkstyle and findbugs.

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, 
> HBASE-21213.branch-2.1.007.patch, HBASE-21213.branch-2.1.008.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-28 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21213:
--
Attachment: HBASE-21213.branch-2.1.008.patch

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch, HBASE-21213.branch-2.1.007.patch, 
> HBASE-21213.branch-2.1.007.patch, HBASE-21213.branch-2.1.008.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-28 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18451:

Attachment: HBASE-18451.master.004.patch

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.004.patch, 
> HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of 

[jira] [Updated] (HBASE-21220) Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21220:
---
Status: Patch Available  (was: Open)

Resubmit patch

> Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and 
> ROWPREFIX_DELIMITED) to branch-1
> --
>
> Key: HBASE-21220
> URL: https://issues.apache.org/jira/browse/HBASE-21220
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-21220-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21220) Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1

2018-09-28 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21220:
---
Status: Open  (was: Patch Available)

> Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and 
> ROWPREFIX_DELIMITED) to branch-1
> --
>
> Key: HBASE-21220
> URL: https://issues.apache.org/jira/browse/HBASE-21220
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: HBASE-21220-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-28 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632147#comment-16632147
 ] 

Josh Elser commented on HBASE-20952:


{quote}This aspect is thin to absent (IMO).
{quote}
That's fair. It's hard to get this all correct and down on paper without 
sending a (literal) book out for review. I see Ted is already on making some 
modifications, and I'll make some passes over the comments too. Appreciate your 
input, boss.

I'll be focusing on your review points around "how things work now" to polish 
that up. Will make it easier to then shift into beefing up the parts on "what 
to change".

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632118#comment-16632118
 ] 

Hadoop QA commented on HBASE-21207:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 9s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
11s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
34s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
5s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
46s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m  2s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}169m 
20s{color} | {color:green} root in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
53s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}249m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21207 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941681/HBASE-21207-branch-2.v1.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  shadedjars  
hadoopcheck  xml  compile  |
| uname | Linux 6ebc08b8115a 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2 / 0ef57439cc |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14529/testReport/ |
| Max. process+thread count | 4934 (vs. ulimit of 1) |
| modules | C: hbase-server . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14529/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add client side sorting functionality in master web UI for table and region 
> server details.
> 

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632108#comment-16632108
 ] 

stack commented on HBASE-20952:
---

bq. From the chatter above on this issue, the request was that a document first 
be presented which outlines how HBase uses WALs and idiosyncrasies/gotchas 
around that usage.

Makes sense. Can't change something if you don't know what it entails. I'd 
think any "revisit" would have such a preface.

bq. and then made a proposal if there was something obvious that should be 
done instead. 

This aspect is thin to absent (IMO).

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-28 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632097#comment-16632097
 ] 

Josh Elser commented on HBASE-20952:


{quote}Yeah, sorry, I'm lost on what the doc is supposed to be doing.
{quote}
>From the chatter above on this issue, the request was that a document first be 
>presented which outlines how HBase uses WALs and idiosyncrasies/gotchas around 
>that usage. The premise being: if we don't understand how we use WALs, we 
>can't make a proposal around what "ideal" is.

The structure of the document was that, for each section, we covered how it 
works now and then made a proposal if there was something obvious that should 
be done instead. Obviously, emphasis is more on the former than the latter 
(given the immediately above acknowledgements)

Thanks for reviewing.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632093#comment-16632093
 ] 

stack commented on HBASE-21233:
---

+1

> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21233.patch, HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632087#comment-16632087
 ] 

stack commented on HBASE-20952:
---

bq. The point of this doc was to make sure we had a clear picture of all 
"WAL-related" things in HBase and what their roles & responsibilities were.

Sorry. I did not get that that was the point reading the doc. The preamble 
drops just when it is about to make the point of what the doc is for. It is 
followed by three sections: "Components of WAL System", "Evolving individual 
WAL system components", and "Limitation". The first seems to be for the 
'WAL-related' listing you suggest. The second's section implies a listing of 
how hbase will be changed ('Evolving') but it seems to be just a continuation 
of the listing ...

Yeah, sorry, I'm lost on what the doc is supposed to be doing.

I also made notes on imprecision and stuff I thought incorrect.

Thanks.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-28 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632075#comment-16632075
 ] 

Josh Elser commented on HBASE-20952:


{quote}IMO, it gives little to no inkling as to how hbase will be changed. I 
left some comments. Thanks.
{quote}
Yes, that was intentional. The point of this doc was to make sure we had a 
clear picture of all "WAL-related" things in HBase and what their roles & 
responsibilities were.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21186) Document hbase.regionserver.executor.openregion.threads in MTTR section

2018-09-28 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632020#comment-16632020
 ] 

Josh Elser commented on HBASE-21186:


How about adding that property to the existing site.xml configuration block 
under "Set the following in the RegionServer." instead of adding it where you 
have? Further, what should a user be setting this property to if the default of 
"3" is bad? 10 to 20?

> Document hbase.regionserver.executor.openregion.threads in MTTR section
> ---
>
> Key: HBASE-21186
> URL: https://issues.apache.org/jira/browse/HBASE-21186
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sahil Aggarwal
>Assignee: Sahil Aggarwal
>Priority: Minor
> Attachments: HBASE-21186.master.001.patch, 
> HBASE-21186.master.002.patch
>
>
> hbase.regionserver.executor.openregion.threads helps in improving MTTR by 
> increasing assign rpc processing rate at RS from HMaster but is not 
> documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21225) Having RPC & Space quota on a table/Namespace doesn't allow space quota to be removed using 'NONE'

2018-09-28 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632004#comment-16632004
 ] 

Josh Elser commented on HBASE-21225:


[~jatsakthi], please update the description with an explanation of the problem, 
not just the reproduction steps. In HBASE-20705, you seemed to have some notion 
of why this still doesn't work in all cases.

A couple of comments:

1. Why all of the modification/removal of existing tests?
{code:java}
 if (quotaToMerge.getRemove()) {
-  // Update the builder to propagate the removal
-  spaceBuilder.setRemove(true).clearSoftLimit().clearViolationPolicy();
+  // To prevent the "REMOVE => true" row in 
QuotaTableUtil.QUOTA_TABLE_NAME
+  spaceBuilder = null;{code}
2. The purpose of the {{REMOVE => true}} entry is to denote that HBase should 
remove the SpaceQuota when it writes that out. With this change, it seems like 
you're leaving the {{REMOVE}} attribute in the protobuf message, but completely 
ignoring it which is confusing.

IMO, the bug is that GlobalSettingsQuotaImpl is not correctly removing the 
SpaceQuota when REMOVE is set to true. Seems the logic I had initially written 
around an RPC and Space quota on the same table/namespace is lacking. Does that 
make sense?

> Having RPC & Space quota on a table/Namespace doesn't allow space quota to be 
> removed using 'NONE'
> --
>
> Key: HBASE-21225
> URL: https://issues.apache.org/jira/browse/HBASE-21225
> Project: HBase
>  Issue Type: Bug
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21225.master.001.patch
>
>
> A part of HBASE-20705 is still unresolved
> {noformat}
> hbase(main):005:0> create 't2','cf'
> Created table t2
> Took 0.7619 seconds
> => Hbase::Table - t2
> hbase(main):006:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.0514 seconds
> hbase(main):007:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
> POLICY => NO_WRITES
> Took 0.0162 seconds
> hbase(main):008:0> list_quotas
> OWNER  QUOTAS
>  TABLE => t2   TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, 
> LIMIT => 10M/sec, SCOPE =>
>MACHINE
>  TABLE => t2   TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, 
> VIOLATION_POLICY => NO_WRIT
>ES
> 2 row(s)
> Took 0.0716 seconds
> hbase(main):009:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => NONE
> Took 0.0082 seconds
> hbase(main):010:0> list_quotas
> OWNER   QUOTAS
>  TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
>  TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
> 2 row(s)
> Took 0.0254 seconds
> hbase(main):011:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
> POLICY => NO_WRITES
> Took 0.0082 seconds
> hbase(main):012:0> list_quotas
> OWNER   QUOTAS
>  TABLE => t2TYPE => THROTTLE, THROTTLE_TYPE => 
> REQUEST_SIZE, LIMIT => 10M/sec, SCOPE => MACHINE
>  TABLE => t2TYPE => SPACE, TABLE => t2, REMOVE => true
> 2 row(s)
> Took 0.0411 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21253) Backport HBASE-21244 Skip persistence when retrying for assignment related procedures to branch-2.0 and branch-2.1

2018-09-28 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21253:
--

 Summary: Backport HBASE-21244 Skip persistence when retrying for 
assignment related procedures to branch-2.0 and branch-2.1 
 Key: HBASE-21253
 URL: https://issues.apache.org/jira/browse/HBASE-21253
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


See HBASE-21244



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21244) Skip persistence when retrying for assignment related procedures

2018-09-28 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631945#comment-16631945
 ] 

Allan Yang commented on HBASE-21244:


{quote}
Allan Yang Mind opening a issue for branch-2.1 and branch-2.0? The assignment 
related procedures are different for these two branches.
{quote}
Sure, with pleasure.

> Skip persistence when retrying for assignment related procedures
> 
>
> Key: HBASE-21244
> URL: https://issues.apache.org/jira/browse/HBASE-21244
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21244.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21187) The HBase UTs are extremely slow on some jenkins node

2018-09-28 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631867#comment-16631867
 ] 

Duo Zhang commented on HBASE-21187:
---

This is build 993
{noformat}
07:41:03 up 81 days, 20 min,  0 users,  load average: 0.92, 0.51, 0.66
{noformat}
We passed.

994
{noformat}
10:09:37 up 81 days,  2:28,  0 users,  load average: 14.13, 12.51, 14.56
{noformat}
Lots of tests failed.

995
{noformat}
10:51:42 up 81 days,  3:15,  0 users,  load average: 3.84, 4.95, 7.31
{noformat}
Only TestRSGroups failed.

996
{noformat}
11:16:46 up 36 days, 13:35,  0 users,  load average: 13.16, 11.57, 11.42
{noformat}
Lots of tests failed.

997
{noformat}
11:46:21 up 81 days,  4:25,  0 users,  load average: 2.34, 3.17, 6.23
{noformat}
Only TestCompactingToCellFlatMapMemStore failed. And it is not because of 
timeout, just an assertion error, so this one is truly flaky...

So I think the problem is that, for TRSP, we will have one more procedure as we 
use a sub procedure to schedule the remote procedure to simplify the logic, so 
on a already loaded machine, and if there are lots of regions to 
assign/unassign, it will be slower as there are extra context switches, and 
lead to the timeout...


> The HBase UTs are extremely slow on some jenkins node
> -
>
> Key: HBASE-21187
> URL: https://issues.apache.org/jira/browse/HBASE-21187
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Priority: Major
>
> Looking at the flaky dashboard for master branch, the top several UTs are 
> likely to fail at the same time. One of the common things for the failed 
> flaky tests job is that, the execution time is more than one hour, and the 
> successful executions are usually only about half an hour.
> And I have compared the output for 
> TestRestoreSnapshotFromClientWithRegionReplicas, for a successful run, the 
> DisableTableProcedure can finish within one second, and for the failed run, 
> it can take even more than half a minute.
> Not sure what is the real problem, but it seems that for the failed runs, 
> there are likely time holes in the output, i.e, there is no log output for 
> several seconds. Like this:
> {noformat}
> 2018-09-11 21:08:08,152 INFO  [PEWorker-4] 
> procedure2.ProcedureExecutor(1500): Finished pid=490, state=SUCCESS, 
> hasLock=false; CreateTableProcedure table=testRestoreSnapshotAfterTruncate in 
> 12.9380sec
> 2018-09-11 21:08:15,590 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=1,queue=0,port=33663] 
> master.MasterRpcServices(1174): Checking to see if procedure is done pid=490
> {noformat}
> No log output for about 7 seconds.
> And for a successful run, the same place
> {noformat}
> 2018-09-12 07:47:32,488 INFO  [PEWorker-7] 
> procedure2.ProcedureExecutor(1500): Finished pid=490, state=SUCCESS, 
> hasLock=false; CreateTableProcedure table=testRestoreSnapshotAfterTruncate in 
> 1.2220sec
> 2018-09-12 07:47:32,881 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=59079] 
> master.MasterRpcServices(1174): Checking to see if procedure is done pid=490
> {noformat}
> There is no such hole.
> Maybe there is big GC?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21252) Collect loadavg when gathering machine environment

2018-09-28 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-21252.
---
Resolution: Not A Problem

OK, uptime already contains the load information. Let me check.

> Collect loadavg when gathering machine environment
> --
>
> Key: HBASE-21252
> URL: https://issues.apache.org/jira/browse/HBASE-21252
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
>
> Skimmed several flaky test report, it seems that, for a successful run, the 
> SystemLoadAverage is usually less than 500, and for a failed run, the 
> SystemLoadAverage is usually greater than 1000.
> Let's collect the loadavg in the script to see if the machine itself has 
> already overloaded before we actually execute the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs

2018-09-28 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21249:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+. Thanks [~Yi Mei] for contributing.

> Add jitter for ProcedureUtil.getBackoffTimeMs
> -
>
> Key: HBASE-21249
> URL: https://issues.apache.org/jira/browse/HBASE-21249
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Yi Mei
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21249.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21244) Skip persistence when retrying for assignment related procedures

2018-09-28 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21244:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-2. Thanks [~zghaobac] for reviewing.

[~allan163] Mind opening a issue for branch-2.1 and branch-2.0? The assignment 
related procedures are different for these two branches.

Thanks.

> Skip persistence when retrying for assignment related procedures
> 
>
> Key: HBASE-21244
> URL: https://issues.apache.org/jira/browse/HBASE-21244
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21244.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21249) Add jitter for ProcedureUtil.getBackoffTimeMs

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631824#comment-16631824
 ] 

Hadoop QA commented on HBASE-21249:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
46s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21249 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941660/HBASE-21249.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 49f3101ea292 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 22ac655704 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14528/testReport/ |
| Max. process+thread count | 282 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14528/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add jitter 

[jira] [Commented] (HBASE-21225) Having RPC & Space quota on a table/Namespace doesn't allow space quota to be removed using 'NONE'

2018-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631821#comment-16631821
 ] 

Hadoop QA commented on HBASE-21225:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
55s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
12m 53s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 36s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21225 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941663/hbase-21225.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 7a4fc891e108 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 22ac655704 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14527/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14527/testReport/ |
| Max. process+thread count | 4598 (vs. ulimit of 1) |
| modules | C: hbase-server U: 

[jira] [Created] (HBASE-21252) Collect loadavg when gathering machine environment

2018-09-28 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21252:
-

 Summary: Collect loadavg when gathering machine environment
 Key: HBASE-21252
 URL: https://issues.apache.org/jira/browse/HBASE-21252
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


Skimmed several flaky test report, it seems that, for a successful run, the 
SystemLoadAverage is usually less than 500, and for a failed run, the 
SystemLoadAverage is usually greater than 1000.

Let's collect the loadavg in the script to see if the machine itself has 
already overloaded before we actually execute the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21233) Allow the procedure implementation to skip persistence of the state after a execution

2018-09-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631812#comment-16631812
 ] 

Hudson commented on HBASE-21233:


Results for branch master
[build #515 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/515/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/515//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/515//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/515//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Allow the procedure implementation to skip persistence of the state after a 
> execution
> -
>
> Key: HBASE-21233
> URL: https://issues.apache.org/jira/browse/HBASE-21233
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21233.patch, HBASE-21233.patch
>
>
> Discussed with [~stack] and [~allan163] on HBASE-21035, that when retrying we 
> do not need to persist the procedure state every time, as the retry timeout 
> is not a critical stuff. It is OK that we loss this information and start 
> from 0 when after restarting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21188) Print heap and gc informations in our junit ResourceChecker

2018-09-28 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21188:
--
Resolution: Later
Status: Resolved  (was: Patch Available)

Reverted for now as it is not the cause for the failing UTs, and gc time and 
count are not 'resources'.

> Print heap and gc informations in our junit ResourceChecker
> ---
>
> Key: HBASE-21188
> URL: https://issues.apache.org/jira/browse/HBASE-21188
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21188.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21244) Skip persistence when retrying for assignment related procedures

2018-09-28 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631797#comment-16631797
 ] 

Guanghao Zhang commented on HBASE-21244:


+1

> Skip persistence when retrying for assignment related procedures
> 
>
> Key: HBASE-21244
> URL: https://issues.apache.org/jira/browse/HBASE-21244
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, Performance, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21244.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-09-28 Thread Archana Katiyar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631772#comment-16631772
 ] 

Archana Katiyar commented on HBASE-21207:
-

Thanks [~apurtell]; I have uploaded patch for branch-2 as well. (patch for 
branch-1 was already uploaded).

> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, HBASE-21207-branch-1.patch, 
> HBASE-21207-branch-1.v1.patch, HBASE-21207-branch-2.v1.patch, 
> HBASE-21207.patch, HBASE-21207.patch, HBASE-21207.v1.patch, 
> edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >