[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711086#comment-16711086
 ] 

Hudson commented on HBASE-21551:


Results for branch branch-2.0
[build #1140 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1140/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1140//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1140//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1140//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21549) Add shell command for serial replication peer

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711075#comment-16711075
 ] 

Hadoop QA commented on HBASE-21549:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
13s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m 
11s{color} | {color:blue} branch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m 
16s{color} | {color:red} The patch generated 18 new + 333 unchanged - 6 fixed = 
351 total (was 339) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m 13s{color} | {color:orange} The patch generated 16 new + 552 unchanged - 0 
fixed = 568 total (was 552) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  4m 
53s{color} | {color:blue} patch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}201m  
5s{color} | {color:green} root in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 0s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}227m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21549 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950787/HBASE-21549.master.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  rubocop  
ruby_lint  refguide  |
| uname | Linux df27c3b2706f 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 67ab8b888f |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15204/artifact/patchprocess/branch-site/book.html
 |
| rubocop | v0.60.0 |
| rubocop | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15204/artifact/patchprocess/diff-patch-rubocop.txt
 |
| ruby-lint | v2.3.1 |
| ruby-lint | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15204/artifact/patchprocess/diff-patch-ruby-lint.txt
 |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15204/artifact/patchprocess/patch-site/book.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15204/testReport/ |
| Max. process+thread count | 5252 (vs. ulimit of 1) |
| modules | C: hbase-shell . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15204/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add shell command for serial replication peer
> 

[jira] [Commented] (HBASE-21549) Add shell command for serial replication peer

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711052#comment-16711052
 ] 

Hadoop QA commented on HBASE-21549:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 9s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m  
1s{color} | {color:blue} branch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m 
16s{color} | {color:red} The patch generated 18 new + 333 unchanged - 6 fixed = 
351 total (was 339) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m 14s{color} | {color:orange} The patch generated 16 new + 552 unchanged - 0 
fixed = 568 total (was 552) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  4m 
56s{color} | {color:blue} patch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}188m 
38s{color} | {color:green} root in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}214m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21549 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950787/HBASE-21549.master.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  rubocop  
ruby_lint  refguide  |
| uname | Linux 8eeb1eb749ba 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 67ab8b888f |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15203/artifact/patchprocess/branch-site/book.html
 |
| rubocop | v0.60.0 |
| rubocop | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15203/artifact/patchprocess/diff-patch-rubocop.txt
 |
| ruby-lint | v2.3.1 |
| ruby-lint | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15203/artifact/patchprocess/diff-patch-ruby-lint.txt
 |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15203/artifact/patchprocess/patch-site/book.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15203/testReport/ |
| Max. process+thread count | 5161 (vs. ulimit of 1) |
| modules | C: hbase-shell . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15203/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add shell command for serial replication peer
> 

[jira] [Resolved] (HBASE-21146) (2.0) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21146.
---
Resolution: Fixed

Pushed to branch-2.0.

> (2.0) Add ability for HBase Canary to ignore a configurable number of 
> ZooKeeper down nodes
> --
>
> Key: HBASE-21146
> URL: https://issues.apache.org/jira/browse/HBASE-21146
> Project: HBase
>  Issue Type: Improvement
>  Components: canary, Zookeeper
>Affects Versions: 1.0.0, 3.0.0, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Fix For: 2.0.4
>
> Attachments: HBASE-21126.branch-1.001.patch, 
> HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, 
> HBASE-21126.master.003.patch, HBASE-21146.branch-2.0.001.patch, 
> zookeeperCanaryLocalTestValidation.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper 
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper 
> server in the ensemble. If any server is unavailable or unresponsive, the 
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can 
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node 
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures {code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still 
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21146) (2.0) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711037#comment-16711037
 ] 

stack commented on HBASE-21146:
---

Ok. Pushed to branch-2.0. Pushed some diff to branch-2.1 but afterward realized 
just formatting changes and reverted the 2.1 (the 2.1 patch was applied by the 
predecessor JIRA, HBASE-21126).

Thanks for the patch [~dmanning].

> (2.0) Add ability for HBase Canary to ignore a configurable number of 
> ZooKeeper down nodes
> --
>
> Key: HBASE-21146
> URL: https://issues.apache.org/jira/browse/HBASE-21146
> Project: HBase
>  Issue Type: Improvement
>  Components: canary, Zookeeper
>Affects Versions: 1.0.0, 3.0.0, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Fix For: 2.0.4
>
> Attachments: HBASE-21126.branch-1.001.patch, 
> HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, 
> HBASE-21126.master.003.patch, HBASE-21146.branch-2.0.001.patch, 
> zookeeperCanaryLocalTestValidation.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper 
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper 
> server in the ensemble. If any server is unavailable or unresponsive, the 
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can 
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node 
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures {code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still 
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21146) (2.0) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21146:
--
Fix Version/s: (was: 2.1.2)
   (was: 2.2.0)
   (was: 3.0.0)
   2.0.4

> (2.0) Add ability for HBase Canary to ignore a configurable number of 
> ZooKeeper down nodes
> --
>
> Key: HBASE-21146
> URL: https://issues.apache.org/jira/browse/HBASE-21146
> Project: HBase
>  Issue Type: Improvement
>  Components: canary, Zookeeper
>Affects Versions: 1.0.0, 3.0.0, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Fix For: 2.0.4
>
> Attachments: HBASE-21126.branch-1.001.patch, 
> HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, 
> HBASE-21126.master.003.patch, HBASE-21146.branch-2.0.001.patch, 
> zookeeperCanaryLocalTestValidation.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper 
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper 
> server in the ensemble. If any server is unavailable or unresponsive, the 
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can 
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node 
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures {code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still 
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21408) Add hbase.wal.dir clean operation also to hbase cleanup script.

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711002#comment-16711002
 ] 

stack commented on HBASE-21408:
---

What is this change about [~sreenivasulureddy] ?

execute_zk_command "deleteall ${zparent}";

and this one...

hwaldir=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool 
hbase.wal.dir`
if [ "$hwaldir" != "null" ]; then hrootdir="$hrootdir $hwaldir"; fi

Should there be a space up there between hrootdir and hwaldir?

Was hwaldir not being set?

Thanks.

> Add hbase.wal.dir clean operation also to hbase cleanup script.
> ---
>
> Key: HBASE-21408
> URL: https://issues.apache.org/jira/browse/HBASE-21408
> Project: HBase
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 2.1.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21408.001.patch
>
>
> If user configured hbase.wal.dir explicitly.
> cleaning scripts should handle this too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21296) [2.1] Upgrade Jetty dependencies to latest in major-line

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711006#comment-16711006
 ] 

stack commented on HBASE-21296:
---

Moved to 2.1.3. Should this be in the 2.1 line at all [~elserj]? Thanks.

> [2.1] Upgrade Jetty dependencies to latest in major-line
> 
>
> Key: HBASE-21296
> URL: https://issues.apache.org/jira/browse/HBASE-21296
> Project: HBase
>  Issue Type: Task
>  Components: dependencies
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21282.001.branch-2.0.patch
>
>
> Looks like we have dependencies on both jetty 9.2 and 9.3, but we're lagging 
> pretty far behind in both. We can upgrade both of these to the latest (august 
> 2018).
>  
> I'll also have to take a look at why we're using two separate versions (maybe 
> we didn't want to switch from jetty-jsp to apache-jsp on 9.2->9.3?). Not sure 
> if there's a good reason for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21146) (2.0) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-21146:
-

Assignee: David Manning

> (2.0) Add ability for HBase Canary to ignore a configurable number of 
> ZooKeeper down nodes
> --
>
> Key: HBASE-21146
> URL: https://issues.apache.org/jira/browse/HBASE-21146
> Project: HBase
>  Issue Type: Improvement
>  Components: canary, Zookeeper
>Affects Versions: 1.0.0, 3.0.0, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.2
>
> Attachments: HBASE-21126.branch-1.001.patch, 
> HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, 
> HBASE-21126.master.003.patch, HBASE-21146.branch-2.0.001.patch, 
> zookeeperCanaryLocalTestValidation.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper 
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper 
> server in the ensemble. If any server is unavailable or unresponsive, the 
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can 
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node 
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures {code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still 
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21296) [2.1] Upgrade Jetty dependencies to latest in major-line

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21296:
--
Fix Version/s: (was: 2.1.2)
   2.1.3

> [2.1] Upgrade Jetty dependencies to latest in major-line
> 
>
> Key: HBASE-21296
> URL: https://issues.apache.org/jira/browse/HBASE-21296
> Project: HBase
>  Issue Type: Task
>  Components: dependencies
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21282.001.branch-2.0.patch
>
>
> Looks like we have dependencies on both jetty 9.2 and 9.3, but we're lagging 
> pretty far behind in both. We can upgrade both of these to the latest (august 
> 2018).
>  
> I'll also have to take a look at why we're using two separate versions (maybe 
> we didn't want to switch from jetty-jsp to apache-jsp on 9.2->9.3?). Not sure 
> if there's a good reason for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21408) Add hbase.wal.dir clean operation also to hbase cleanup script.

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21408:
--
Fix Version/s: (was: 2.1.2)
   2.1.3

> Add hbase.wal.dir clean operation also to hbase cleanup script.
> ---
>
> Key: HBASE-21408
> URL: https://issues.apache.org/jira/browse/HBASE-21408
> Project: HBase
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 2.1.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21408.001.patch
>
>
> If user configured hbase.wal.dir explicitly.
> cleaning scripts should handle this too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710999#comment-16710999
 ] 

stack commented on HBASE-21413:
---

+1 on commit for branch-2.0 and branch-2.1 [~allan163]

> Empty meta log doesn't get split when restart whole cluster
> ---
>
> Key: HBASE-21413
> URL: https://issues.apache.org/jira/browse/HBASE-21413
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Jingyun Tian
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21413.branch-2.1.001.patch, 
> HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, 
> Screenshot from 2018-10-31 18-11-11.png
>
>
> After I restart whole cluster, there is a splitting directory still exists on 
> hdfs. Then I found there is only an empty meta wal file in it. I'll dig into 
> this later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21401) Sanity check when constructing the KeyValue

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21401:
--
Fix Version/s: (was: 2.0.4)
   (was: 2.1.2)
   2.0.5
   2.1.3

> Sanity check when constructing the KeyValue
> ---
>
> Key: HBASE-21401
> URL: https://issues.apache.org/jira/browse/HBASE-21401
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21401.v1.patch, HBASE-21401.v2.patch, 
> HBASE-21401.v3.patch, HBASE-21401.v4.patch, HBASE-21401.v4.patch, 
> HBASE-21401.v5.patch, HBASE-21401.v6.patch, HBASE-21401.v7.patch
>
>
> In KeyValueDecoder & ByteBuffKeyValueDecoder,  we pass a byte buffer to 
> initialize the Cell without a sanity check (check each field's offset 
> exceed the byte buffer or not), so ArrayIndexOutOfBoundsException may happen 
> when read the cell's fields, such as HBASE-21379,  it's hard to debug this 
> kind of bug. 
> An earlier check will help to find such kind of bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710992#comment-16710992
 ] 

stack commented on HBASE-15560:
---

I put a petition for a volunteer on our dev list.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15560:
--
Fix Version/s: 2.2.0
   3.0.0

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710991#comment-16710991
 ] 

stack commented on HBASE-15560:
---

bq. Sorry that this dropped off my radar.

Smile. Two years.

Weird is that this came up today out of the blue.

bq. This can only be definitively answered by someone willing to canary an 
instance in a live environment.

Lets get a volunteer. Otherwise, I should be in a position to try this in a 
week or so.

Thanks for coming back [~ben.manes]


> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710986#comment-16710986
 ] 

Hadoop QA commented on HBASE-21554:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}127m 
56s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21554 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950785/HBASE-21554.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  |
| uname | Linux 7f81c4fd56fb 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3b854859f6 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15202/testReport/ |
| Max. process+thread count | 4583 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15202/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Show replication endpoint classname for replication peer on master web UI
> -
>
> Key: HBASE-21554
> URL: https://issues.apache.org/jira/browse/HBASE-21554
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21554.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21558) Set version to 2.1.2 on branch-2.1 so can cut an RC

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21558.
---
  Resolution: Fixed
Release Note: Set version to 2.1.2 from 2.1.2-SNAPSHOT

Pushed to branch-2.1

> Set version to 2.1.2 on branch-2.1 so can cut an RC
> ---
>
> Key: HBASE-21558
> URL: https://issues.apache.org/jira/browse/HBASE-21558
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.1.2
> $ find . -name pom.xml -exec git add {} \;
> $ git commit ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21558) Set version to 2.1.2 on branch-2.1 so can cut an RC

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21558:
--
Description: 
mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.1.2
$ find . -name pom.xml -exec git add {} \;
$ git commit ...

> Set version to 2.1.2 on branch-2.1 so can cut an RC
> ---
>
> Key: HBASE-21558
> URL: https://issues.apache.org/jira/browse/HBASE-21558
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.1.2
> $ find . -name pom.xml -exec git add {} \;
> $ git commit ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21558) Set version to 2.1.2 on branch-2.1 so can cut an RC

2018-12-05 Thread stack (JIRA)
stack created HBASE-21558:
-

 Summary: Set version to 2.1.2 on branch-2.1 so can cut an RC
 Key: HBASE-21558
 URL: https://issues.apache.org/jira/browse/HBASE-21558
 Project: HBase
  Issue Type: Sub-task
  Components: release
Reporter: stack
Assignee: stack






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21557) Set version to 2.0.4 on branch-2.0 so can cut an RC

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21557.
---
   Resolution: Fixed
Fix Version/s: 2.0.4
 Release Note: Set project version to 2.0.4 from 2.0.4-SNAPSHOT

Pushed to branch-2.0.

> Set version to 2.0.4 on branch-2.0 so can cut an RC
> ---
>
> Key: HBASE-21557
> URL: https://issues.apache.org/jira/browse/HBASE-21557
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.4
>
>
> $ mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.0.4
> $ find . -name pom.xml -exec git add {} \;
> $ git commit ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21557) Set version to 2.0.4 on branch-2.0 so can cut an RC

2018-12-05 Thread stack (JIRA)
stack created HBASE-21557:
-

 Summary: Set version to 2.0.4 on branch-2.0 so can cut an RC
 Key: HBASE-21557
 URL: https://issues.apache.org/jira/browse/HBASE-21557
 Project: HBase
  Issue Type: Sub-task
  Components: release
Reporter: stack
Assignee: stack


$ mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.0.4
$ find . -name pom.xml -exec git add {} \;
$ git commit ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21556) Create 2.1.2 release

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21556:
--
Attachment: Screen Shot 2018-12-05 at 8.38.32 PM.png

> Create 2.1.2 release
> 
>
> Key: HBASE-21556
> URL: https://issues.apache.org/jira/browse/HBASE-21556
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 8.38.32 PM.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21556) Create 2.1.2 release

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21556:
--
Description: 
Roll new 2.1 because of memory leak. See HBASE-21551

2.1 is doing not too bad. 3 of last 5 passed.

 !Screen Shot 2018-12-05 at 8.38.32 PM.png! 

> Create 2.1.2 release
> 
>
> Key: HBASE-21556
> URL: https://issues.apache.org/jira/browse/HBASE-21556
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 8.38.32 PM.png
>
>
> Roll new 2.1 because of memory leak. See HBASE-21551
> 2.1 is doing not too bad. 3 of last 5 passed.
>  !Screen Shot 2018-12-05 at 8.38.32 PM.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21555) Create 2.0.4 release

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21555:
--
Description: 
Roll new 2.1 because of memory leak. See HBASE-21551

Branch-2.0 was doing nicely. 10 of the last 14 passed here is a run of 6 
back-to-back that all passed. !Screen Shot 2018-12-05 at 8.38.32 PM.png! 

  was:Branch-2.0 was doing nicely. 10 of the last 14 passed here is a run 
of 6 back-to-back that all passed. !Screen Shot 2018-12-05 at 8.38.32 PM.png! 


> Create 2.0.4 release
> 
>
> Key: HBASE-21555
> URL: https://issues.apache.org/jira/browse/HBASE-21555
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 8.38.32 PM.png
>
>
> Roll new 2.1 because of memory leak. See HBASE-21551
> Branch-2.0 was doing nicely. 10 of the last 14 passed here is a run of 6 
> back-to-back that all passed. !Screen Shot 2018-12-05 at 8.38.32 PM.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21555) Create 2.0.4 release

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21555:
--
Description: Branch-2.0 was doing nicely. 10 of the last 14 passed here 
is a run of 6 back-to-back that all passed. !Screen Shot 2018-12-05 at 8.38.32 
PM.png! 

> Create 2.0.4 release
> 
>
> Key: HBASE-21555
> URL: https://issues.apache.org/jira/browse/HBASE-21555
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 8.38.32 PM.png
>
>
> Branch-2.0 was doing nicely. 10 of the last 14 passed here is a run of 6 
> back-to-back that all passed. !Screen Shot 2018-12-05 at 8.38.32 PM.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-05 Thread Karan Mehta (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710977#comment-16710977
 ] 

Karan Mehta commented on HBASE-21553:
-

Good Finding [~xucang]!!

FYI [~sukumaddineni] [~swaroopa]

This is probably the root cause of stuck procedures in the cluster.

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710976#comment-16710976
 ] 

Hadoop QA commented on HBASE-15560:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HBASE-15560 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-15560 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15205/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21555) Create 2.0.4 release

2018-12-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21555:
--
Attachment: Screen Shot 2018-12-05 at 8.38.32 PM.png

> Create 2.0.4 release
> 
>
> Key: HBASE-21555
> URL: https://issues.apache.org/jira/browse/HBASE-21555
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 8.38.32 PM.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-05 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710972#comment-16710972
 ] 

Ben Manes commented on HBASE-15560:
---

Sorry that this dropped off my radar. I can summarize a few things stated 
before, with minor updates.
h5. Current (SLRU)
 * Design:
 ** Reads perform a ConcurrentHashMap lookup, increment a global AtomicLong 
counter for the access time, and marks the block type as frequent
 ** Writes perform a ConcurrentHashMap update and notifies a thread if the 
cache overflows 
 ** Thread wakes up every 10s or when notified. Performs an O(n lg n) sort, and 
evicts the recency/frequency segments up to watermarks
 * Benefits
 ** Provides scan resistance and captures simple frequency workloads. Is 
optimal for Zipf.
 ** Has minimal latencies at low/modest concurrency as does very little work on 
requesting threads
 * Costs
 ** At high concurrency, AtomicLong would be a synchronization bottleneck (~10M 
op/sec if I recall correctly). This probably does not matter due to disk I/O, 
network I/O, etc. resulting in modest thrashing on this counter.
 ** No back-pressure on writes if the cache cannot evict fast enough. However, 
the I/O involved may make this moot. 
 ** Expected lower hit rates in real-world traces, based on the variety of 
workload we have examined (non-HBase, various freq/recency mixtures)

h5. Caffeine (Proposed, TinyLFU)
 * Design:
 ** Reads perform a ConcurrentHashMap lookup, hash to a ring buffer (growable 
array of buffers), and tries to add the item (up to 3 times, may rehash). If 
the ring buffer is full or a state machine flag is marked, then tryLocks to 
schedule a task on an executor.
 ** Writes perform a ConcurrentHashMap update, add to a ring buffer (blocking 
if full), updates a state machine flag, and tryLocks to schedule a task on an 
executor.
 ** Executor drains the ring buffers, replays the events on the eviction 
policy, evicts if the cache has overflowed (default: ForkJoinPool.commonPool()).
 * Benefits
 ** Allows higher degree of read concurrency by not having a single point of 
contention (striped ring buffers)
 ** Offers back-pressure on writes if the eviction thread cannot keep up 
(deschedules writers by them taking the global lock if the buffer is full)
 ** Spreads out small chunks of O(1) work
 ** Allows more advanced policies / data-structures (TinyLFU, Hierarchical 
TimerWheel) => higher hit rates & more features
 * Costs
 ** Slightly higher penalties on read / write (no free lunch)
 ** Is more biased towards frequency (a negative if a recency-skewed workload)

h5. Synopsis

The SLRU is the cheapest (latency) and most optimal (hit rate) for synthetic 
Zipf testing. It was designed with those considerations in mind. Any other 
solution will trade higher latency for better hit rates and system behavior. 
The question is then if the latency difference is small enough (effectively 
noise) and the higher hit rate improves overall performance. *This can only be 
definitively answered by someone willing to canary an instance in a live 
environment.* My belief, from analyzing hit rates and their impacts on other 
applications, is that there will be a benefit.
h5. TinyLFU improvements

We have been exploring ways to improve TinyLFU-based policies in adversarial 
workloads (recency-biased). In those cases work is brought in, operated on 
repeatedly, and then never touched again. A good example of that is a 
transaction log or a distributed compilation cache (with local cache). In those 
workloads frequency is a negative signal, as by the time the score is high 
enough for retention the item is no longer worth retaining.

We have been working on adaptive schemes by sampling the workload and adjusting 
based on its characteristics 
([paper|https://drive.google.com/open?id=1CT2ASkfuG9qVya9Sn8ZUCZjrFSSyjRA_]). 
Both a naive hill climber and a statistics-based model correct the policy to 
the optimal hit rate. I hope to try [adaptive moment 
estimation|https://arxiv.org/abs/1412.6980], an advanced hill climber, which I 
believe will be the most robust and inexpensive mechanism (as proven by the ML 
community). This work will allow the cache to offer the best hit rate 
regardless of workload, which no other policy has been able to do so far.
h5. Next Steps

I don't think there is anything meaningful that I can offer to this ticket. If 
this was to go in, either a leap of faith by making it an option or someone in 
the community would have to prove the benefit. Without an environment or trace, 
we can't do more than discuss minor details from synthetic testing.

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>

[jira] [Created] (HBASE-21556) Create 2.1.2 release

2018-12-05 Thread stack (JIRA)
stack created HBASE-21556:
-

 Summary: Create 2.1.2 release
 Key: HBASE-21556
 URL: https://issues.apache.org/jira/browse/HBASE-21556
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21555) Create 2.0.4 release

2018-12-05 Thread stack (JIRA)
stack created HBASE-21555:
-

 Summary: Create 2.0.4 release
 Key: HBASE-21555
 URL: https://issues.apache.org/jira/browse/HBASE-21555
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710965#comment-16710965
 ] 

stack commented on HBASE-21551:
---

Agree this is bad. I can roll 2.1.2 and 2.0.4 over the w/e (will start now).

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21551:
-
Attachment: HBASE-21551.v3.patch

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21551:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710959#comment-16710959
 ] 

Zheng Hu commented on HBASE-21551:
--

Finished to commit all branch-2.x . Thanks [~Apache9], [~zghaobac], [~allan163] 
for reviewing. 

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-12-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710957#comment-16710957
 ] 

Hudson commented on HBASE-20734:


Results for branch branch-2.0
[build #1139 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-20734.branch-1.001.patch, 
> HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, 
> HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, 
> HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, 
> HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, 
> HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, 
> HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, 
> HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, 
> HBASE-20734.master.011.patch, HBASE-20734.master.012.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir

2018-12-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710956#comment-16710956
 ] 

Hudson commented on HBASE-21544:


Results for branch branch-2.0
[build #1139 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1139//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir
> --
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch, 
> HBASE-20734.002.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for 

[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710954#comment-16710954
 ] 

Zheng Hu commented on HBASE-21551:
--

Made a stupid mistake when committing,  I add more asserts in the committed UT 
[1].  here I close the region which result in the failure of other cases in the 
same UT. so I commit an addendum[2]. Sorry for the noise. (patch for branch-2 
no need the addendum, because I merged them into one)

1. 
https://github.com/apache/hbase/commit/3b854859f6fad44cbf31164374569a6ab23f3623#diff-801daeaf9f3c8ddb85e743c06d79c7edR142
2. 
https://github.com/apache/hbase/commit/67ab8b888f8b393979624a2bd7d527fefd9dd6d7

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21549) Add shell command for serial replication peer

2018-12-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21549:
---
Attachment: HBASE-21549.master.003.patch

> Add shell command for serial replication peer
> -
>
> Key: HBASE-21549
> URL: https://issues.apache.org/jira/browse/HBASE-21549
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21549.master.001.patch, 
> HBASE-21549.master.002.patch, HBASE-21549.master.003.patch
>
>
> add_peer support add a serial replication peer directly.
> set_peer_serial support change a replication peer's serial flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21549) Add shell command for serial replication peer

2018-12-05 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710934#comment-16710934
 ] 

Guanghao Zhang commented on HBASE-21549:


Add a 002 patch which addressed the review comments. Add more doc to reference 
guide and shell help message.

> Add shell command for serial replication peer
> -
>
> Key: HBASE-21549
> URL: https://issues.apache.org/jira/browse/HBASE-21549
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21549.master.001.patch, 
> HBASE-21549.master.002.patch
>
>
> add_peer support add a serial replication peer directly.
> set_peer_serial support change a replication peer's serial flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21549) Add shell command for serial replication peer

2018-12-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21549:
---
Attachment: HBASE-21549.master.002.patch

> Add shell command for serial replication peer
> -
>
> Key: HBASE-21549
> URL: https://issues.apache.org/jira/browse/HBASE-21549
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21549.master.001.patch, 
> HBASE-21549.master.002.patch
>
>
> add_peer support add a serial replication peer directly.
> set_peer_serial support change a replication peer's serial flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-05 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21553:

Description: 
https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749

As shown above, we didn't unlock schedLock which can cause deadlock.

Besides this, there are other places in this class handles schedLock.unlock in 
a risky manner. I'd like to move them to finally block to improve the 
robustness of handling locks.

  
was:https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749


> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21552) backport HBASE-16735(Procedure v2 - Fix yield while holding locks) to branch-1 .

2018-12-05 Thread Xu Cang (JIRA)


[jira] [Updated] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21554:
---
Status: Patch Available  (was: Open)

> Show replication endpoint classname for replication peer on master web UI
> -
>
> Key: HBASE-21554
> URL: https://issues.apache.org/jira/browse/HBASE-21554
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21554.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-21554:
--

Assignee: Guanghao Zhang

> Show replication endpoint classname for replication peer on master web UI
> -
>
> Key: HBASE-21554
> URL: https://issues.apache.org/jira/browse/HBASE-21554
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21554.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-05 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21554:
---
Attachment: HBASE-21554.master.001.patch

> Show replication endpoint classname for replication peer on master web UI
> -
>
> Key: HBASE-21554
> URL: https://issues.apache.org/jira/browse/HBASE-21554
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21554.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21414) StoreFileSize growth rate metric

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710912#comment-16710912
 ] 

Hadoop QA commented on HBASE-21414:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
47s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
26s{color} | {color:blue} hbase-hadoop2-compat in master has 18 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} hbase-hadoop-compat: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} hbase-hadoop2-compat: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} hbase-server: The patch generated 0 new + 3 
unchanged - 2 fixed = 3 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
45s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}126m 
56s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 5s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21414 |
| JIRA 

[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710910#comment-16710910
 ] 

Zheng Hu commented on HBASE-21551:
--

OK,  checked the branch-1's patch in HBASE-20704, seems branch-1 is not 
effected.  I will commit this patch to branch-2.*.  Thanks.

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-05 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-21554:
--

 Summary: Show replication endpoint classname for replication peer 
on master web UI
 Key: HBASE-21554
 URL: https://issues.apache.org/jira/browse/HBASE-21554
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21534) TestAssignmentManager is flakey

2018-12-05 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21534:
--
Component/s: test

> TestAssignmentManager is flakey
> ---
>
> Key: HBASE-21534
> URL: https://issues.apache.org/jira/browse/HBASE-21534
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21534-addendum-v1.patch, 
> HBASE-21534-addendum.patch, HBASE-21534.patch
>
>
> See this in the outout and then the test hang
> {noformat}
> 2018-11-29 20:47:50,061 WARN  [MockRSProcedureDispatcher-pool5-t10] 
> assignment.AssignmentManager(894): The region server localhost,102,1 is 
> already dead, skip reportRegionStateTransition call
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710895#comment-16710895
 ] 

Zheng Hu commented on HBASE-21551:
--

BTW, after applied the patch.v2 to our internal branch,  I found the full gc 
did not occur again. 

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710894#comment-16710894
 ] 

Zheng Hu commented on HBASE-21551:
--

The bug seems was introduced by HBASE-20704.  all branch-2.*  are effected, not 
sure branch-1 .. Let me check it.  

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710886#comment-16710886
 ] 

Allan Yang commented on HBASE-21551:


This is indeed a serious one. +1 for the patch.

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21534) TestAssignmentManager is flakey

2018-12-05 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21534:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Also pushed to branch-2.

> TestAssignmentManager is flakey
> ---
>
> Key: HBASE-21534
> URL: https://issues.apache.org/jira/browse/HBASE-21534
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21534-addendum-v1.patch, 
> HBASE-21534-addendum.patch, HBASE-21534.patch
>
>
> See this in the outout and then the test hang
> {noformat}
> 2018-11-29 20:47:50,061 WARN  [MockRSProcedureDispatcher-pool5-t10] 
> assignment.AssignmentManager(894): The region server localhost,102,1 is 
> already dead, skip reportRegionStateTransition call
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21553) schedLock not released ni MasterProcedureScheduler

2018-12-05 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta reassigned HBASE-21553:
---

Assignee: (was: Karan Mehta)

> schedLock not released ni MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21534) TestAssignmentManager is flakey

2018-12-05 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710882#comment-16710882
 ] 

Duo Zhang commented on HBASE-21534:
---

Seems worked. Let me also pushed to branch-2 and resolve this issue.

> TestAssignmentManager is flakey
> ---
>
> Key: HBASE-21534
> URL: https://issues.apache.org/jira/browse/HBASE-21534
> Project: HBase
>  Issue Type: Task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-21534-addendum-v1.patch, 
> HBASE-21534-addendum.patch, HBASE-21534.patch
>
>
> See this in the outout and then the test hang
> {noformat}
> 2018-11-29 20:47:50,061 WARN  [MockRSProcedureDispatcher-pool5-t10] 
> assignment.AssignmentManager(894): The region server localhost,102,1 is 
> already dead, skip reportRegionStateTransition call
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710871#comment-16710871
 ] 

Duo Zhang commented on HBASE-21551:
---

Oh this is a big problem I'd say. How many branches are effected?

+1 on the patch.

Ping [~stack]. I think we have to release a 2.1.1.1 and also 2.0.4 ASAP?

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21553) schedLock not released ni MasterProcedureScheduler

2018-12-05 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta reassigned HBASE-21553:
---

Assignee: Karan Mehta

> schedLock not released ni MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Assignee: Karan Mehta
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21553) schedLock not released ni MasterProcedureScheduler

2018-12-05 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta reassigned HBASE-21553:
---

Assignee: Karan Mehta

> schedLock not released ni MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Assignee: Karan Mehta
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-05 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated HBASE-21553:

Summary: schedLock not released in MasterProcedureScheduler  (was: 
schedLock not released ni MasterProcedureScheduler)

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21553) schedLock not released ni MasterProcedureScheduler

2018-12-05 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta reassigned HBASE-21553:
---

Assignee: (was: Karan Mehta)

> schedLock not released ni MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21530) Abort_Procedure should be able to take a list of proc IDs

2018-12-05 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710872#comment-16710872
 ] 

Allan Yang commented on HBASE-21530:


{quote}
Does this make sense to you? If so, I can start looking into adding some logic 
here to make abort createNamespaceProcedure
{quote}
It makes sense, you can try to add some abort logic to the procedure you think 
may get stuck. and for aborting the procedures in the queue(which are not 
started yet), this may be a little tricky, the queue in the 
MasterProcedureScheduler(which the procedures will be added into) is a special 
queue, only poll() or peek() is available. Some changes needed to make here to 
make it possible to remove a certain procedure from it. 

> Abort_Procedure should be able to take a list of proc IDs
> -
>
> Key: HBASE-21530
> URL: https://issues.apache.org/jira/browse/HBASE-21530
> Project: HBase
>  Issue Type: Improvement
>Reporter: Geoffrey Jacoby
>Priority: Minor
>
> As a convenience, it would be helpful if the HBase shell's abort_procedure 
> call had the option of taking in multiple procedure ids at the same time, 
> rather than relying on operators to use a loop in an external script. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21553) schedLock not released ni MasterProcedureScheduler

2018-12-05 Thread Xu Cang (JIRA)
Xu Cang created HBASE-21553:
---

 Summary: schedLock not released ni MasterProcedureScheduler
 Key: HBASE-21553
 URL: https://issues.apache.org/jira/browse/HBASE-21553
 Project: HBase
  Issue Type: Improvement
Reporter: Xu Cang


https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21552) backport HBASE-16735(Procedure v2 - Fix yield while holding locks) to branch-1 .

2018-12-05 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21552:

Description: 
Please see screenshot for the stack trace. 
We met this issue in production: many createNamespaceProcedures cannot proceed.
After some debugging and JIRA digging, I saw this one: HBASE-16735. 

It might fix the stuck procedure issue, but this is worth backporting. 


  was:
Please see screenshot for the stack trace. 
We met this issue in production: many createNamespaceProcedures cannot proceed.
After some debugging and JIRA digging, Ik thin HBASE-16735 addressed this 
issue. It fixed the issue that WAITING procedure fails to be added back to the 
runQueue. 
But that change wasn't ported to branch-1. I am creating this JIRA for 
backporting it to branch-1


> backport  HBASE-16735(Procedure v2 - Fix yield while holding locks)  to 
> branch-1 . 
> ---
>
> Key: HBASE-21552
> URL: https://issues.apache.org/jira/browse/HBASE-21552
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 4.34.05 PM.png
>
>
> Please see screenshot for the stack trace. 
> We met this issue in production: many createNamespaceProcedures cannot 
> proceed.
> After some debugging and JIRA digging, I saw this one: HBASE-16735. 
> It might fix the stuck procedure issue, but this is worth backporting. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21552) backport HBASE-16735(Procedure v2 - Fix yield while holding locks) to branch-1 .

2018-12-05 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21552:

Description: 
Please see screenshot for the stack trace. 
We met this issue in production: many createNamespaceProcedures cannot proceed.
After some debugging and JIRA digging, Ik thin HBASE-16735 addressed this 
issue. It fixed the issue that WAITING procedure fails to be added back to the 
runQueue. 
But that change wasn't ported to branch-1. I am creating this JIRA for 
backporting it to branch-1

  was:
Please see screenshot for the stack trace. 
We met this issue in production: many createNamespaceProcedures cannot proceed.
After some debugging and JIRA digging, I think HBASE-16735 addressed this 
issue. It fixed the issue that WAITING procedure fails to be added back to the 
runQueue. 
But that change wasn't ported to branch-1. I am creating this JIRA for 
backporting it to branch-1


> backport  HBASE-16735(Procedure v2 - Fix yield while holding locks)  to 
> branch-1 . 
> ---
>
> Key: HBASE-21552
> URL: https://issues.apache.org/jira/browse/HBASE-21552
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 4.34.05 PM.png
>
>
> Please see screenshot for the stack trace. 
> We met this issue in production: many createNamespaceProcedures cannot 
> proceed.
> After some debugging and JIRA digging, Ik thin HBASE-16735 addressed this 
> issue. It fixed the issue that WAITING procedure fails to be added back to 
> the runQueue. 
> But that change wasn't ported to branch-1. I am creating this JIRA for 
> backporting it to branch-1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor

2018-12-05 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21550:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-2. Will fill the release note later.

Thanks [~stack] for reviewing.

> Add a new method preCreateTableRegionInfos for MasterObserver which allows 
> CPs to modify the TableDescriptor
> 
>
> Key: HBASE-21550
> URL: https://issues.apache.org/jira/browse/HBASE-21550
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21550.patch
>
>
> Before 2.0, we will pass a HTableDescriptor and the CPs can modify the schema 
> of a table, but now we will pass a TableDescriptor, which is immutable. I 
> think it is correct to pass an immutable instance here, but we should have a 
> return value for this method to allow CPs to return a new TableDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21552) backport HBASE-16735(Procedure v2 - Fix yield while holding locks) to branch-1 .

2018-12-05 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21552:

Affects Version/s: 1.3.2

> backport  HBASE-16735(Procedure v2 - Fix yield while holding locks)  to 
> branch-1 . 
> ---
>
> Key: HBASE-21552
> URL: https://issues.apache.org/jira/browse/HBASE-21552
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
> Attachments: Screen Shot 2018-12-05 at 4.34.05 PM.png
>
>
> Please see screenshot for the stack trace. 
> We met this issue in production: many createNamespaceProcedures cannot 
> proceed.
> After some debugging and JIRA digging, I think HBASE-16735 addressed this 
> issue. It fixed the issue that WAITING procedure fails to be added back to 
> the runQueue. 
> But that change wasn't ported to branch-1. I am creating this JIRA for 
> backporting it to branch-1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21552) backport HBASE-16735(Procedure v2 - Fix yield while holding locks) to branch-1 .

2018-12-05 Thread Xu Cang (JIRA)
Xu Cang created HBASE-21552:
---

 Summary: backport  HBASE-16735(Procedure v2 - Fix yield while 
holding locks)  to branch-1 . 
 Key: HBASE-21552
 URL: https://issues.apache.org/jira/browse/HBASE-21552
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Reporter: Xu Cang
Assignee: Xu Cang
 Attachments: Screen Shot 2018-12-05 at 4.34.05 PM.png

Please see screenshot for the stack trace. 
We met this issue in production: many createNamespaceProcedures cannot proceed.
After some debugging and JIRA digging, I think HBASE-16735 addressed this 
issue. It fixed the issue that WAITING procedure fails to be added back to the 
runQueue. 
But that change wasn't ported to branch-1. I am creating this JIRA for 
backporting it to branch-1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor

2018-12-05 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710793#comment-16710793
 ] 

Duo Zhang commented on HBASE-21550:
---

OK, can backport later if other folks want it on branch-2.1 and branch-2.0.

> Add a new method preCreateTableRegionInfos for MasterObserver which allows 
> CPs to modify the TableDescriptor
> 
>
> Key: HBASE-21550
> URL: https://issues.apache.org/jira/browse/HBASE-21550
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21550.patch
>
>
> Before 2.0, we will pass a HTableDescriptor and the CPs can modify the schema 
> of a table, but now we will pass a TableDescriptor, which is immutable. I 
> think it is correct to pass an immutable instance here, but we should have a 
> return value for this method to allow CPs to return a new TableDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21414) StoreFileSize growth rate metric

2018-12-05 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710759#comment-16710759
 ] 

Sergey Shelukhin commented on HBASE-21414:
--

+1 pending tests


> StoreFileSize growth rate metric
> 
>
> Key: HBASE-21414
> URL: https://issues.apache.org/jira/browse/HBASE-21414
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, monitoring
>Reporter: Tommy Li
>Priority: Minor
> Attachments: HBASE-21414.master.001.patch, 
> HBASE-21414.master.002.patch
>
>
> A metric on the growth rate of storefile sizes would be nice to have as a way 
> of monitoring traffic patterns. I know you can get the same insight from 
> graphing the delta on the storeFileSize metric, but not all metrics 
> visualization tools support that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21414) StoreFileSize growth rate metric

2018-12-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HBASE-21414:


Assignee: Tommy Li

> StoreFileSize growth rate metric
> 
>
> Key: HBASE-21414
> URL: https://issues.apache.org/jira/browse/HBASE-21414
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, monitoring
>Reporter: Tommy Li
>Assignee: Tommy Li
>Priority: Minor
> Attachments: HBASE-21414.master.001.patch, 
> HBASE-21414.master.002.patch
>
>
> A metric on the growth rate of storefile sizes would be nice to have as a way 
> of monitoring traffic patterns. I know you can get the same insight from 
> graphing the delta on the storeFileSize metric, but not all metrics 
> visualization tools support that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21414) StoreFileSize growth rate metric

2018-12-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21414:
-
Status: Patch Available  (was: Open)

> StoreFileSize growth rate metric
> 
>
> Key: HBASE-21414
> URL: https://issues.apache.org/jira/browse/HBASE-21414
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, monitoring
>Reporter: Tommy Li
>Priority: Minor
> Attachments: HBASE-21414.master.001.patch, 
> HBASE-21414.master.002.patch
>
>
> A metric on the growth rate of storefile sizes would be nice to have as a way 
> of monitoring traffic patterns. I know you can get the same insight from 
> graphing the delta on the storeFileSize metric, but not all metrics 
> visualization tools support that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21530) Abort_Procedure should be able to take a list of proc IDs

2018-12-05 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710727#comment-16710727
 ] 

Xu Cang commented on HBASE-21530:
-

[~allan163]

bq. Aborting is not very useful to get out of stuck, *since some procedure is 
not abortable.* 

But can we make them abortable, at least for some procedures?
For example, *createNamespaceProcedure*. It's not abortable now, but I checked 
the code, there are 5 states from it and I think it's fairly safe to abort from 
any state. It might leave some data in namespace table or namespace ZK 
directories. But these 2 ops are idempotent. (they can be re-ran multiple times 
without causing any issue.) So there is no real issue aborting it.
Does this make sense to you? If so, I can start looking into adding some logic 
here to make abort createNamespaceProcedure possible for branch-1.

(Branch-1 is still crucial to us and having AMv2 only on branch-2 doesn't help 
resolving our issues.)

Thanks you
 

> Abort_Procedure should be able to take a list of proc IDs
> -
>
> Key: HBASE-21530
> URL: https://issues.apache.org/jira/browse/HBASE-21530
> Project: HBase
>  Issue Type: Improvement
>Reporter: Geoffrey Jacoby
>Priority: Minor
>
> As a convenience, it would be helpful if the HBase shell's abort_procedure 
> call had the option of taking in multiple procedure ids at the same time, 
> rather than relying on operators to use a loop in an external script. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21545) NEW_VERSION_BEHAVIOR breaks Get/Scan with specified columns

2018-12-05 Thread Andrey Elenskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710686#comment-16710686
 ] 

Andrey Elenskiy commented on HBASE-21545:
-

I've uploaded a patch where I modified HBaseTestingUtility to set 
NEW_VERSION_BEHAVIOR attribute in integration tests when 
"-Dhbase.tests.new.version.behavior=true" option is passed. This way we 
validate that all tests pass with this attribute. Would be great if you could 
trigger a build.

> NEW_VERSION_BEHAVIOR breaks Get/Scan with specified columns
> ---
>
> Key: HBASE-21545
> URL: https://issues.apache.org/jira/browse/HBASE-21545
> Project: HBase
>  Issue Type: Bug
>  Components: API
>Affects Versions: 2.0.0, 2.1.1
> Environment: HBase 2.1.1
> Hadoop 2.8.4
> Java 8
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
> Attachments: App.java, HBASE-21545.branch-2.1.0001.patch, 
> HBASE-21545.branch-2.1.0002.patch, HBASE-21545.branch-2.1.0003.patch, 
> HBASE-21545.branch-2.1.0004.patch
>
>
> Setting NEW_VERSION_BEHAVIOR => 'true' on a column family causes only one 
> column to be returned when columns are specified in Scan or Get query. The 
> result is always one first column by sorted order. I've attached a code 
> snipped to reproduce the issue that can be converted into a test.
> I've also validated with hbase shell and gohbase client, so it's gotta be 
> server side issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21545) NEW_VERSION_BEHAVIOR breaks Get/Scan with specified columns

2018-12-05 Thread Andrey Elenskiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Elenskiy updated HBASE-21545:

Attachment: HBASE-21545.branch-2.1.0004.patch

> NEW_VERSION_BEHAVIOR breaks Get/Scan with specified columns
> ---
>
> Key: HBASE-21545
> URL: https://issues.apache.org/jira/browse/HBASE-21545
> Project: HBase
>  Issue Type: Bug
>  Components: API
>Affects Versions: 2.0.0, 2.1.1
> Environment: HBase 2.1.1
> Hadoop 2.8.4
> Java 8
>Reporter: Andrey Elenskiy
>Assignee: Andrey Elenskiy
>Priority: Major
> Attachments: App.java, HBASE-21545.branch-2.1.0001.patch, 
> HBASE-21545.branch-2.1.0002.patch, HBASE-21545.branch-2.1.0003.patch, 
> HBASE-21545.branch-2.1.0004.patch
>
>
> Setting NEW_VERSION_BEHAVIOR => 'true' on a column family causes only one 
> column to be returned when columns are specified in Scan or Get query. The 
> result is always one first column by sorted order. I've attached a code 
> snipped to reproduce the issue that can be converted into a test.
> I've also validated with hbase shell and gohbase client, so it's gotta be 
> server side issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21544) Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir

2018-12-05 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-21544:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: This change moves the recovered.edits files which are created 
by the WALSplitter from the default filesystem into the WAL filesystem. This 
better enables the separate filesystem for WAL and HFile deployment model, by 
avoiding a check which requires that the HFile filesystem provides the hflush 
capability.
  Status: Resolved  (was: Patch Available)

> Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir
> --
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch, 
> HBASE-20734.002.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir

2018-12-05 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710579#comment-16710579
 ] 

Zach York commented on HBASE-21544:
---

Good by me. Feel free to push.

> Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir
> --
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch, 
> HBASE-20734.002.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21544) Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir

2018-12-05 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710577#comment-16710577
 ] 

Josh Elser commented on HBASE-21544:


{quote} Nothing changed in patch, even the authors, but i think this one should 
be credited to you [~elserj], not only filing this backport, but also changing 
Sir Stack's mind. :D
{quote}
heh, thanks for the kind words. What I did was nothing compared to the original 
author's efforts :).

Going to push this one before I forget, but happy to hear from you, Zach, if 
you find the time.

> Backport HBASE-20734 Colocate recovered edits directory with hbase.wal.dir
> --
>
> Key: HBASE-21544
> URL: https://issues.apache.org/jira/browse/HBASE-21544
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.0.4
>
> Attachments: HBASE-20734.001.branch-2.0.patch, 
> HBASE-20734.002.branch-2.0.patch
>
>
> Been talking through this with a bunch of folks. [~enis] brought me back from 
> the cliff of despair though.
> Context: running HBase on top of a filesystem that doesn't have hflush for 
> hfiles. In our case, on top of Azure's Hadoop-compatible filesystems (WASB, 
> ABFS).
> When a RS fails and we have an SCP running for it, you'll see log splitting 
> get into an "infinite" loop where the master keeps resubmitting and the RS 
> which takes the action deterministically fails with the following:
> {noformat}
> 2018-11-26 20:59:18,415 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.FSHLogProvider: The RegionServer write ahead log provider for FileSystem 
> implementations relies on the ability to call hflush for proper operation 
> during component failures, but the current FileSystem does not support doing 
> so. Please check the config value of 'hbase.wal.dir' and ensure it points to 
> a FileSystem mount that has suitable capabilities for output streams.
> 2018-11-26 20:59:18,415 WARN  
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] 
> wal.AbstractProtobufLogWriter: WALTrailer is null. Continuing with default.
> 2018-11-26 20:59:18,467 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/wn2-b831f9:16020-0-Writer-2] wal.WALSplitter: 
> Got while writing log entry to log
> java.io.IOException: cannot get log writer
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:96)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:61)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createRecoveredEditsWriter(WALFactory.java:370)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter.createWriter(WALSplitter.java:804)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.createWAP(WALSplitter.java:1530)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(WALSplitter.java:1501)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1584)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1566)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1090)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1082)
> at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1052)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.initOutput(ProtobufLogWriter.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:165)
> at 
> org.apache.hadoop.hbase.wal.FSHLogProvider.createWriter(FSHLogProvider.java:77)
> ... 10 more{noformat}
> This is the sanity check added by HBASE-18784, failing on creating the writer 
> for the recovered.edits file.
> The odd-ball here is that our recovered.edits writer is just a WAL writer 
> class. The WAL writer class thinks it always should have hflush support; 
> however, we don't _actually_ need that for writing out the recovered.edits 
> files. If {{close()}} on the recovered.edits file would fail, we're trash any 
> intermediate data in the filesystem and rerun the whole process.
> It's my understanding that this check is over-bearing and we should not make 
> the check when the ProtobufLogWriter is being used for the recovered.edits 
> file.
> [~zyork], [~busbey] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710543#comment-16710543
 ] 

stack commented on HBASE-21550:
---

Do we need it in branch-2.1 or branch-2.0? Lets do without backport if we can.

Otherwise, +1 on patch for branch-2+.

> Add a new method preCreateTableRegionInfos for MasterObserver which allows 
> CPs to modify the TableDescriptor
> 
>
> Key: HBASE-21550
> URL: https://issues.apache.org/jira/browse/HBASE-21550
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21550.patch
>
>
> Before 2.0, we will pass a HTableDescriptor and the CPs can modify the schema 
> of a table, but now we will pass a TableDescriptor, which is immutable. I 
> think it is correct to pass an immutable instance here, but we should have a 
> return value for this method to allow CPs to return a new TableDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21464) Splitting blocked with meta NSRE during split transaction

2018-12-05 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21464:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolved but not pushed yet. I have a 1.4.9 rc1 candidate tagged, also not 
pushed yet. Testing first.

> Splitting blocked with meta NSRE during split transaction
> -
>
> Key: HBASE-21464
> URL: https://issues.apache.org/jira/browse/HBASE-21464
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.8, 1.4.7
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Blocker
> Fix For: 1.5.0, 1.4.9
>
> Attachments: HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch, 
> HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch
>
>
> Splitting is blocked during split transaction. The split worker is trying to 
> update meta but isn't able to relocate it after NSRE:
> {noformat}
> 2018-11-09 17:50:45,277 INFO  
> [regionserver/ip-172-31-5-92.us-west-2.compute.internal/172.31.5.92:8120-splits-1541785709434]
>  client.RpcRetryingCaller: Call exception, tries=13, retries=350, 
> started=88590 ms ago, cancelled=false, 
> msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 
> is not online on ip-172-31-13-83.us-west-2.compute.internal,8120,1541785618832
>      at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3088)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2198)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2396)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)row 
> 'test,,1541785709452.5ba6596f0050c2dab969d152829227c6.44' on table 
> 'hbase:meta' at region=hbase:meta,1.1588230740, 
> hostname=ip-172-31-15-225.us-west-2.compute.internal,8120,1541785640586, 
> seqNum=0{noformat}
> Clients, in this case YCSB, are hung with part of the keyspace missing:
> {noformat}
> 2018-11-09 17:51:06,033 DEBUG [hconnection-0x5739e567-shared--pool1-t165] 
> client.ConnectionManager$HConnectionImplementation: locateRegionInMeta 
> parentTable=hbase:meta, metaLocation=, attempt=14 of 35 failed; retrying 
> after sleep of 20158 because: No server address listed in hbase:meta for 
> region 
> test,user307326104267982763,1541785754600.ef90030b05cb02305b75e9bfbc3ee081. 
> containing row user3301635648728421323{noformat}
> Balancing cannot run indefinitely because the split transaction is stuck
> {noformat}
> 2018-11-09 17:49:55,478 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=8100] master.HMaster: 
> Not running balancer because 3 region(s) in transition: 
> [{ef90030b05cb02305b75e9bfbc3ee081 state=SPLITTING_NEW, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute.internal,8120,1541785626417}, 
> {5ba6596f0050c2dab969d152829227c6 state=SPLITTING, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21541) Move MetaTableLocator.verifyRegionLocation to hbase-rsgroup module

2018-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710450#comment-16710450
 ] 

stack commented on HBASE-21541:
---

Not for branch-2.1 nor branch-2.0 [~Apache9]. Thanks.

> Move MetaTableLocator.verifyRegionLocation to hbase-rsgroup module
> --
>
> Key: HBASE-21541
> URL: https://issues.apache.org/jira/browse/HBASE-21541
> Project: HBase
>  Issue Type: Task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21541-v1.patch, HBASE-21541-v2.patch, 
> HBASE-21541-v3.patch, HBASE-21541.patch
>
>
> As it is only used there, and it is the only method which needs a 
> ClusterConnection in MetaTableLocator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-05 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710447#comment-16710447
 ] 

Anoop Sam John commented on HBASE-21514:


Added few comments in RB.. But only half way through the patch.  Give me a day 
more to complete. 

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch
>
>
> # move the global cache instances from CacheConfig to BlockCacheFactory. Only 
> keep config stuff in CacheConfig.
>  # Move block cache to HRegionServer's member variable. One rs has one block 
> cache.
>  # Still keep GLOBAL_BLOCK_CACHE_INSTANCE in BlockCacheFactory. As there are 
> some unit tests which don't start a mini cluster. But want to use block 
> cache, too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21453) Convert ReadOnlyZKClient to DEBUG instead of INFO

2018-12-05 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710445#comment-16710445
 ] 

Sakthi commented on HBASE-21453:


Sure [~psomogyi], just wanted to keep the scope of this Jira limited to 
ReadOnlyZKClient only logs as is mentioned in the tile/description. Do we need 
to change the title or are we okay just making the changes under this title?

> Convert ReadOnlyZKClient to DEBUG instead of INFO
> -
>
> Key: HBASE-21453
> URL: https://issues.apache.org/jira/browse/HBASE-21453
> Project: HBase
>  Issue Type: Bug
>  Components: logging, Zookeeper
>Reporter: stack
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21453.master.001.patch
>
>
> Running commands in spark-shell, this is what it looks like on each 
> invocation:
> {code}
> scala> val count = rdd.count()
> 2018-11-07 21:01:46,026 INFO  [Executor task launch worker for task 1] 
> zookeeper.ReadOnlyZKClient: Connect 0x18f3d868 to localhost:2181 with session 
> timeout=9ms, retries 30, retry interval 1000ms, keepAlive=6ms
> 2018-11-07 21:01:46,027 INFO  [ReadOnlyZKClient-localhost:2181@0x18f3d868] 
> zookeeper.ZooKeeper: Initiating client connection, 
> connectString=localhost:2181 sessionTimeout=9 
> watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$20/1362339879@743dab9f
> 2018-11-07 21:01:46,030 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> 2018-11-07 21:01:46,031 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Socket connection established to 
> localhost/127.0.0.1:2181, initiating session
> 2018-11-07 21:01:46,033 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Session establishment complete on server 
> localhost/127.0.0.1:2181, sessionid = 0x166f1b283080005, negotiated timeout = 
> 4
> 2018-11-07 21:01:46,035 INFO  [Executor task launch worker for task 1] 
> mapreduce.TableInputFormatBase: Input split length: 0 bytes.
> [Stage 1:>  (0 + 1) / 
> 1]2018-11-07 21:01:48,074 INFO  [Executor task launch worker for task 1] 
> zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x18f3d868 to 
> localhost:2181
> 2018-11-07 21:01:48,075 INFO  [ReadOnlyZKClient-localhost:2181@0x18f3d868] 
> zookeeper.ZooKeeper: Session: 0x166f1b283080005 closed
> 2018-11-07 21:01:48,076 INFO  [ReadOnlyZKClient 
> -localhost:2181@0x18f3d868-EventThread] zookeeper.ClientCnxn: EventThread 
> shut down for session: 0x166f1b283080005
> count: Long = 10
> {code}
> Let me shut down the ReadOnlyZKClient log level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710429#comment-16710429
 ] 

Hadoop QA commented on HBASE-21551:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
50s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
42s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}128m 
31s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}164m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21551 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950711/HBASE-21551.v2.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 905af43cecc5 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8bf966c8e9 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15200/testReport/ |
| Max. process+thread count | 4909 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15200/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Memory leak when use scan with 

[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710368#comment-16710368
 ] 

Hadoop QA commented on HBASE-21551:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
10s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}136m 21s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestMultiColumnScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21551 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950706/HBASE-21551.v1.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux b4d9647c0f18 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8bf966c8e9 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15199/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 

[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710239#comment-16710239
 ] 

Hadoop QA commented on HBASE-21505:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
19s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} hbase-hadoop2-compat in master has 18 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  1m 
24s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
18s{color} | {color:red} hbase-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
42s{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 18s{color} | 
{color:red} hbase-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  1m 42s{color} | 
{color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 18s{color} 
| {color:red} hbase-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 42s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} hbase-hadoop2-compat: The patch generated 6 new + 0 
unchanged - 0 fixed = 6 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
34s{color} | {color:red} hbase-client: The patch generated 5 new + 185 
unchanged - 0 fixed = 190 total (was 185) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
15s{color} | {color:red} hbase-server: The patch generated 19 new + 86 
unchanged - 2 fixed = 105 total (was 88) {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m  
8s{color} | {color:red} The patch generated 55 new + 405 unchanged - 9 fixed = 
460 total (was 414) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m  4s{color} | {color:orange} The patch generated 3 new + 748 unchanged - 1 
fixed = 751 total (was 749) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  2m  
5s{color} | {color:red} patch has 14 errors when building our shaded downstream 
artifacts. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  1m 
21s{color} | {color:red} The patch causes 14 errors with Hadoop v2.7.4. {color} 
|
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red}  2m 
38s{color} | {color:red} The patch causes 14 errors with Hadoop v3.0.0. {color} 
|
| {color:red}-1{color} | {color:red} hbaseprotoc {color} | {color:red}  0m 
19s{color} | {color:red} hbase-client 

[jira] [Updated] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21551:
-
Attachment: HBASE-21551.v2.patch

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection

2018-12-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710160#comment-16710160
 ] 

Hudson commented on HBASE-21512:


Results for branch HBASE-21512
[build #7 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/7/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/7//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/7//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/7//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
> --
>
> Key: HBASE-21512
> URL: https://issues.apache.org/jira/browse/HBASE-21512
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
>
> At least for the RSProcedureDispatcher, with CompletableFuture we do not need 
> to set a delay and use a thread pool any more, which could reduce the 
> resource usage and also the latency.
> Once this is done, I think we can remove the ClusterConnection completely, 
> and start to rewrite the old sync client based on the async client, which 
> could reduce the code base a lot for our client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor

2018-12-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710159#comment-16710159
 ] 

Hadoop QA commented on HBASE-21550:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} hbase-server: The patch generated 0 new + 180 
unchanged - 12 fixed = 180 total (was 192) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
52s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 38s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}239m  4s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}275m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.client.TestSnapshotTemporaryDirectoryWithRegionReplicas |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21550 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950677/HBASE-21550.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux df690e11fd01 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8bf966c8e9 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15197/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15197/testReport/ |
| Max. process+thread count | 

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-05 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710123#comment-16710123
 ] 

Guanghao Zhang commented on HBASE-21514:


Ping [~stack]  [~Apache9] [~anoop.hbase] for reviewing.

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch
>
>
> # move the global cache instances from CacheConfig to BlockCacheFactory. Only 
> keep config stuff in CacheConfig.
>  # Move block cache to HRegionServer's member variable. One rs has one block 
> cache.
>  # Still keep GLOBAL_BLOCK_CACHE_INSTANCE in BlockCacheFactory. As there are 
> some unit tests which don't start a mini cluster. But want to use block 
> cache, too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710128#comment-16710128
 ] 

Zheng Hu commented on HBASE-21551:
--

bq. Add a ut for this case? 
Yeah, will do, the patch.v1 is just used for showing the bug..

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-21406) "status 'replication'" should not show SINK if the cluster does not act as sink

2018-12-05 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21406 started by Wellington Chevreuil.

> "status 'replication'" should not show SINK if the cluster does not act as 
> sink
> ---
>
> Key: HBASE-21406
> URL: https://issues.apache.org/jira/browse/HBASE-21406
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daisuke Kobayashi
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HBASE-21406-branch-1.001.patch, Screen Shot 2018-10-31 
> at 18.12.54.png
>
>
> When replicating in 1 way, from source to target, {{status 'replication'}} on 
> source always dumps SINK with meaningless metrics. It only makes sense when 
> running the command on target cluster.
> {{status 'replication'}} on source, for example. {{AgeOfLastAppliedOp}} is 
> always zero and {{TimeStampsOfLastAppliedOp}} does not get updated from the 
> time the RS started since it's not acting as sink.
> {noformat}
> source-1.com
>SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, 
> TimeStampsOfLastShippedOp=Mon Oct 29 23:44:14 PDT 2018, Replication Lag=0
>SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Thu Oct 25 
> 23:56:53 PDT 2018
> {noformat}
> {{status 'replication'}} on target works as expected. SOURCE is empty as it's 
> not acting as source:
> {noformat}
> target-1.com
>SOURCE:
>SINK  : AgeOfLastAppliedOp=70, TimeStampsOfLastAppliedOp=Mon Oct 29 
> 23:44:08 PDT 2018
> {noformat}
> This is because {{getReplicationLoadSink}}, called in {{admin.rb}}, always 
> returns a value (not null).
> 1.X
> https://github.com/apache/hbase/blob/rel/1.4.0/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerLoad.java#L194-L204
> 2.X
> https://github.com/apache/hbase/blob/rel/2.0.0/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerLoad.java#L392-L399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710125#comment-16710125
 ] 

Guanghao Zhang commented on HBASE-21551:


Add a ut for this case?

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-05 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710122#comment-16710122
 ] 

Guanghao Zhang commented on HBASE-21514:


Do we have annotations to fix the checkstyle warning? And the javac warnings 
are not introduced by this patch... 

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch
>
>
> # move the global cache instances from CacheConfig to BlockCacheFactory. Only 
> keep config stuff in CacheConfig.
>  # Move block cache to HRegionServer's member variable. One rs has one block 
> cache.
>  # Still keep GLOBAL_BLOCK_CACHE_INSTANCE in BlockCacheFactory. As there are 
> some unit tests which don't start a mini cluster. But want to use block 
> cache, too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21551:
-
Status: Patch Available  (was: Open)

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21551:
-
Attachment: HBASE-21551.v1.patch

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-05 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21505:
-
Status: Patch Available  (was: In Progress)

Took me longer than expected to do a more thorough review and analyse some test 
failures, so finally submitting a patch as initial proposal. There are few 
changes from the initial "draft" patch in the info reported by the command:

1) *getTimeStampOfLastAttemtped* and *TimeStampOfLastArrivedInSource* metric 
previously reported were dropped, now we show *TimeStampOfNextToReplicate,* 
which shows the source inception time of next edit in the queue to be 
replicated. This is used to calculate *Replication Lag.*

2) New metrics *EditsReadFromLogQueue* and *OpsShippedToTarget* were added to 
help determine the delta (in # of OPs) between source and target.

3) *Replication Lag* now calculated as (*CurrentTime - 
TimeStampOfNextToReplicate*) if *AgeOfLastShippedOp* < 
*TimeStampOfNextToReplicate*.

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, 
> HBASE-21505-master.001.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a 

[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-05 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21505:
-
Attachment: HBASE-21505-master.001.patch

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, 
> HBASE-21505-master.001.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format 
> gives the impression there’s only one, even when there are recovery queues, 
> for instance. 
> Here is a list of ideas on how the command should report under different 
> states of replication:
> a) Source started, target stopped, no edits arrived on source yet: 
> 

[jira] [Updated] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-05 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21551:
-
Description: 
We open the RegionServerScanner with STREAM as following: 

{code}
RegionScannerImpl#initializeScanners
  |---> HStore#getScanner
|--> StoreScanner()
|---> 
StoreFileScanner#getScannersForStoreFiles
  |--> 
HStoreFile#getStreamScanner  #1
{code}

In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
but not remove the StreamReader from streamReaders until closing the store 
file. 

So if we  scan with stream with  so many times, the streamReaders hash map will 
be exploded.   we can see the heap dump in the attached heap-dump.jpg. 

I found this bug, because when i benchmark the scan performance by using YCSB 
in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
full gc ( ~ 110 sec)

  was:
We open the RegionServerScanner with STREAM as following: 

{code}
RegionScannerImpl#initializeScanners
  |---> HStore#getScanner
|--> StoreScanner()
|---> 
StoreFileScanner#getScannersForStoreFiles
  |--> 
HStoreFile#getStreamScanner  #1
{code}

In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
but not remove the StreamReader from streamReaders until closing the store 
file. 

So if we  scan with stream with  so many times, the streamReaders hash map will 
be exploded.   we can see the heap dump in the attached heap-dump.jpg. 

I found this bug, because when i benchmark the scan performance by using YCSB 
in a cluster (heap size of RS is 50g),  the Rs will be easy to happen a long 
time full gc ( ~ 110 sec)


> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >