[jira] [Commented] (HBASE-20165) Shell command to make a normal peer to be a serial replication peer

2018-03-11 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394825#comment-16394825
 ] 

Guanghao Zhang commented on HBASE-20165:


We also need show serial state in list_peers result?

> Shell command to make a normal peer to be a serial replication peer
> ---
>
> Key: HBASE-20165
> URL: https://issues.apache.org/jira/browse/HBASE-20165
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-20165.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader

2018-03-11 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394822#comment-16394822
 ] 

Guanghao Zhang commented on HBASE-20167:


+1.

> Optimize the implementation of ReplicationSourceWALReader
> -
>
> Key: HBASE-20167
> URL: https://issues.apache.org/jira/browse/HBASE-20167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20167-v1.patch, HBASE-20167-v2.patch, 
> HBASE-20167.patch
>
>
> After HBASE-20148, serial replication will be an option for peer. Since an 
> instance of ReplicationSourceWALReader can only belongs to one peer, we do 
> not need to add the so many 'if' in the implementation of readWALEntries to 
> check whether we should consider serial replication. We can just make a sub 
> class or something similiar for serial replication to make the code clean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2018-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394814#comment-16394814
 ] 

Hadoop QA commented on HBASE-19389:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
34s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} The patch hbase-common passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} hbase-server: The patch generated 0 new + 388 
unchanged - 1 fixed = 388 total (was 389) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
54s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 47s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
16s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}110m  
3s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19389 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913955/HBASE-19389.master.v2.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 6bde3f6ecac2 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-11 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394811#comment-16394811
 ] 

ramkrishna.s.vasudevan commented on HBASE-20045:


bq. I see the argument above about not bothering to cache a block if all its 
cells are weeks old. In our case, the data is advertising identifiers and can 
come in unpredictably, and like I said we have a big enough bucket cache 
anyway, so why not just cache everything? The old blocks from the compacted 
away files are going to be evicted anyway, so we should never run out of bucket 
cache if we have sized it much larger than our entire data size.
[~saadmufti]
Thanks for chiming in here. I like your argument. But one thing to note is that 
even if your compacted file blocks (old files) are evicted away when the new 
file is created after compaction (assuming there are no deletes) then almost 
the same number of blocks will be created again (new file after compaction) 
unless the the Column famliy has a TTL. Even I thought we can do this but the 
discussion here helped me to understand that it may not be possible always but 
may be cache recent data alone like what JMS says here. 
WE should also try out your suggestion also may be with a config, but warn the 
user that only a big enough bucket cache can help here. So just roughly can you 
say what is your bucket cache size ? and I think it is file mode and that file 
is in S3.

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader

2018-03-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394801#comment-16394801
 ] 

Duo Zhang commented on HBASE-20167:
---

Changed the several fields to be private.

> Optimize the implementation of ReplicationSourceWALReader
> -
>
> Key: HBASE-20167
> URL: https://issues.apache.org/jira/browse/HBASE-20167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20167-v1.patch, HBASE-20167-v2.patch, 
> HBASE-20167.patch
>
>
> After HBASE-20148, serial replication will be an option for peer. Since an 
> instance of ReplicationSourceWALReader can only belongs to one peer, we do 
> not need to add the so many 'if' in the implementation of readWALEntries to 
> check whether we should consider serial replication. We can just make a sub 
> class or something similiar for serial replication to make the code clean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader

2018-03-11 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20167:
--
Attachment: HBASE-20167-v2.patch

> Optimize the implementation of ReplicationSourceWALReader
> -
>
> Key: HBASE-20167
> URL: https://issues.apache.org/jira/browse/HBASE-20167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20167-v1.patch, HBASE-20167-v2.patch, 
> HBASE-20167.patch
>
>
> After HBASE-20148, serial replication will be an option for peer. Since an 
> instance of ReplicationSourceWALReader can only belongs to one peer, we do 
> not need to add the so many 'if' in the implementation of readWALEntries to 
> check whether we should consider serial replication. We can just make a sub 
> class or something similiar for serial replication to make the code clean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20124) Make hbase-spark module work with hadoop3

2018-03-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394799#comment-16394799
 ] 

Ted Yu commented on HBASE-20124:


[~mdrob]:
Can you take another look ?

> Make hbase-spark module work with hadoop3
> -
>
> Key: HBASE-20124
> URL: https://issues.apache.org/jira/browse/HBASE-20124
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 20124.v1.txt, 20124.v2.txt, 20124.v3.txt
>
>
> The following error can be observed when running tests in hbase-spark module 
> against hadoop3:
> {code}
> HBaseDStreamFunctionsSuite:
> *** RUN ABORTED ***
>   java.lang.NoClassDefFoundError: org/apache/hadoop/ipc/ExternalCall
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStorageDirs(FSNamesystem.java:1464)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNamespaceDirs(FSNamesystem.java:1444)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:939)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:815)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:746)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:668)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:640)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:979)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:859)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:853)
>   ...
>   Cause: java.lang.ClassNotFoundException: org.apache.hadoop.ipc.ExternalCall
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getStorageDirs(FSNamesystem.java:1464)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNamespaceDirs(FSNamesystem.java:1444)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:939)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:815)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:746)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:668)
> {code}
> The dependency tree shows mixture of hadoop 2.7.4 and hadoop3 for the 
> hbase-spark module.
> This should be addressed by adding proper profile in pom.xml



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader

2018-03-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394797#comment-16394797
 ] 

Duo Zhang commented on HBASE-20167:
---

{quote}
Why this block was removed?
{quote}

You can see the code in ReplicationSource.tryStartNewShipper, I changed the 
order. Now we will set WALReader first and then start the shipper, so it will 
never be null.

> Optimize the implementation of ReplicationSourceWALReader
> -
>
> Key: HBASE-20167
> URL: https://issues.apache.org/jira/browse/HBASE-20167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20167-v1.patch, HBASE-20167.patch
>
>
> After HBASE-20148, serial replication will be an option for peer. Since an 
> instance of ReplicationSourceWALReader can only belongs to one peer, we do 
> not need to add the so many 'if' in the implementation of readWALEntries to 
> check whether we should consider serial replication. We can just make a sub 
> class or something similiar for serial replication to make the code clean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader

2018-03-11 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394780#comment-16394780
 ] 

Guanghao Zhang commented on HBASE-20167:


bq. protected final ReplicationSource source;
bq. protected final long replicationBatchSizeCapacity;
Can be private?


{code:java}
while (entryReader == null) {   
100 if (sleepForRetries("Replication WAL entry reader thread not 
initialized",  
101   sleepMultiplier)) {   
102   sleepMultiplier++;
103 }   
104   }
{code}
Why this block was removed?


> Optimize the implementation of ReplicationSourceWALReader
> -
>
> Key: HBASE-20167
> URL: https://issues.apache.org/jira/browse/HBASE-20167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20167-v1.patch, HBASE-20167.patch
>
>
> After HBASE-20148, serial replication will be an option for peer. Since an 
> instance of ReplicationSourceWALReader can only belongs to one peer, we do 
> not need to add the so many 'if' in the implementation of readWALEntries to 
> check whether we should consider serial replication. We can just make a sub 
> class or something similiar for serial replication to make the code clean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2018-03-11 Thread Chance Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394738#comment-16394738
 ] 

Chance Li commented on HBASE-19389:
---

checkstyle, run UT again

> Limit concurrency of put with dense (hundreds) columns to prevent write 
> handler exhausted
> -
>
> Key: HBASE-19389
> URL: https://issues.apache.org/jira/browse/HBASE-19389
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 2.0.0
> Environment: 2000+ Region Servers
> PCI-E ssd
>Reporter: Chance Li
>Assignee: Chance Li
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: CSLM-concurrent-write.png, 
> HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, 
> HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, 
> HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, 
> HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, 
> HBASE-19389-branch-2.patch, HBASE-19389.master.patch, 
> HBASE-19389.master.v2.patch, metrics-1.png, ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's 
> handlers are all busy sometimes. And after investigation we found the root 
> cause is about CSLM, such as compare function heavy load. We reviewed the 
> related WALs, and found that there were many columns (more than 1000 columns) 
> were writing at that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2018-03-11 Thread Chance Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chance Li updated HBASE-19389:
--
Attachment: HBASE-19389.master.v2.patch

> Limit concurrency of put with dense (hundreds) columns to prevent write 
> handler exhausted
> -
>
> Key: HBASE-19389
> URL: https://issues.apache.org/jira/browse/HBASE-19389
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 2.0.0
> Environment: 2000+ Region Servers
> PCI-E ssd
>Reporter: Chance Li
>Assignee: Chance Li
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: CSLM-concurrent-write.png, 
> HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, 
> HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, 
> HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, 
> HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, 
> HBASE-19389-branch-2.patch, HBASE-19389.master.patch, 
> HBASE-19389.master.v2.patch, metrics-1.png, ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's 
> handlers are all busy sometimes. And after investigation we found the root 
> cause is about CSLM, such as compare function heavy load. We reviewed the 
> related WALs, and found that there were many columns (more than 1000 columns) 
> were writing at that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394733#comment-16394733
 ] 

Hadoop QA commented on HBASE-20105:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
29s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
11s{color} | {color:red} hbase-server: The patch generated 1 new + 98 unchanged 
- 0 fixed = 99 total (was 98) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
45s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
18s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 
57s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}156m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20105 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913951/HBASE-20105-v5.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 48a8ae329a4c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| 

[jira] [Commented] (HBASE-20152) [AMv2] DisableTableProcedure versus ServerCrashProcedure

2018-03-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394721#comment-16394721
 ] 

Duo Zhang commented on HBASE-20152:
---

{quote}
Can't schedule an SCP. Fails because server is going down already not 
present as online. Procedure suspended.
{quote}

So the SCP is suspended? This is a little confusing to me... I think SCP is for 
a crashed server? Why is it suspended?

> [AMv2] DisableTableProcedure versus ServerCrashProcedure
> 
>
> Key: HBASE-20152
> URL: https://issues.apache.org/jira/browse/HBASE-20152
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Seeing a small spate of issues where disabled tables/regions are being 
> assigned. Usually they happen when a DisableTableProcedure is running 
> concurrent with a ServerCrashProcedure. See below. See associated 
> HBASE-20131. This is umbrella issue for fixing.
> h3. Deadlock
> From HBASE-20137, 'TestRSGroups is Flakey', 
> https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
> {code}
>  * SCP is running because a server was aborted in test.
>  * SCP starts AssignProcedure of region X from crashed server.
>  * DisableTable Procedure runs because test has finished and we're doing 
> table delete. Queues 
>  * UnassignProcedure for region X.
>  * Disable Unassign gets Lock on region X first.
>  * SCP AssignProcedure tries to get lock, waits on lock.
>  * DisableTable Procedure UnassignProcedure RPC fails because server is down 
> (Thats why the SCP).
>  * Tries to expire the server it failed the RPC against. Fails (currently 
> being SCP'd).
>  * DisableTable Procedure Unassign is suspended. It is a suspend with lock on 
> region X held
>  * SCP can't run because lock on X is held
>  * Test timesout.
> {code}
> Here is the actual log from around the deadlock. pid=308 is the SCP. pid=309 
> is the disable table:
> {code}
> 2018-03-05 11:29:21,224 DEBUG [PEWorker-7] 
> procedure.ServerCrashProcedure(225): Done splitting WALs pid=308, 
> state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS; ServerCrashProcedure 
> server=1cfd208ff882,40584,1520249102524, splitWal=true, meta=false
> 2018-03-05 11:29:21,300 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> rsgroup.RSGroupAdminServer(371): Move server done: default=>appInfo
> 2018-03-05 11:29:21,307 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl(279): 
> Client=jenkins//172.17.0.2 list rsgroup
> 2018-03-05 11:29:21,312 INFO  [Time-limited test] client.HBaseAdmin$15(901): 
> Started disable of Group_ns:testKillRS
> 2018-03-05 11:29:21,313 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.HMaster$7(2278): Client=jenkins//172.17.0.2 disable Group_ns:testKillRS
> 2018-03-05 11:29:21,384 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=310, 
> ppid=308, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
> table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043}]
> 2018-03-05 11:29:21,534 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> procedure2.ProcedureExecutor(865): Stored pid=309, 
> state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure 
> table=Group_ns:testKillRS
> 2018-03-05 11:29:21,542 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:21,644 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:21,847 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:22,118 DEBUG [PEWorker-5] hbase.MetaTableAccessor(1944): Put 
> {"totalColumns":1,"row":"Group_ns:testKillRS","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1520249362117}]},"ts":1520249362117}
> 2018-03-05 11:29:22,123 INFO  [PEWorker-5] hbase.MetaTableAccessor(1646): 
> Updated table Group_ns:testKillRS state to DISABLING in META
> 2018-03-05 11:29:22,148 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:22,345 INFO  [PEWorker-5] 
> procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=311, 
> ppid=309, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043, 
> 

[jira] [Commented] (HBASE-20152) [AMv2] DisableTableProcedure versus ServerCrashProcedure

2018-03-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394719#comment-16394719
 ] 

Duo Zhang commented on HBASE-20152:
---

{quote}
Trying to think through abandon... Would work in the small but hard part is how 
to fail compound procedures like Move and Disable Table... Split, Merge.. ..
{quote}
I do not think we should abandon it. In this case suspend is OK, but the SCP 
should be finished first, and then we wake up the unassign procedure and it can 
go on. Fail an unassign procedure may have lots of side effect, as you describe 
above, how do we process the parent procedures...

Thanks.

> [AMv2] DisableTableProcedure versus ServerCrashProcedure
> 
>
> Key: HBASE-20152
> URL: https://issues.apache.org/jira/browse/HBASE-20152
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Seeing a small spate of issues where disabled tables/regions are being 
> assigned. Usually they happen when a DisableTableProcedure is running 
> concurrent with a ServerCrashProcedure. See below. See associated 
> HBASE-20131. This is umbrella issue for fixing.
> h3. Deadlock
> From HBASE-20137, 'TestRSGroups is Flakey', 
> https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
> {code}
>  * SCP is running because a server was aborted in test.
>  * SCP starts AssignProcedure of region X from crashed server.
>  * DisableTable Procedure runs because test has finished and we're doing 
> table delete. Queues 
>  * UnassignProcedure for region X.
>  * Disable Unassign gets Lock on region X first.
>  * SCP AssignProcedure tries to get lock, waits on lock.
>  * DisableTable Procedure UnassignProcedure RPC fails because server is down 
> (Thats why the SCP).
>  * Tries to expire the server it failed the RPC against. Fails (currently 
> being SCP'd).
>  * DisableTable Procedure Unassign is suspended. It is a suspend with lock on 
> region X held
>  * SCP can't run because lock on X is held
>  * Test timesout.
> {code}
> Here is the actual log from around the deadlock. pid=308 is the SCP. pid=309 
> is the disable table:
> {code}
> 2018-03-05 11:29:21,224 DEBUG [PEWorker-7] 
> procedure.ServerCrashProcedure(225): Done splitting WALs pid=308, 
> state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS; ServerCrashProcedure 
> server=1cfd208ff882,40584,1520249102524, splitWal=true, meta=false
> 2018-03-05 11:29:21,300 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> rsgroup.RSGroupAdminServer(371): Move server done: default=>appInfo
> 2018-03-05 11:29:21,307 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl(279): 
> Client=jenkins//172.17.0.2 list rsgroup
> 2018-03-05 11:29:21,312 INFO  [Time-limited test] client.HBaseAdmin$15(901): 
> Started disable of Group_ns:testKillRS
> 2018-03-05 11:29:21,313 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.HMaster$7(2278): Client=jenkins//172.17.0.2 disable Group_ns:testKillRS
> 2018-03-05 11:29:21,384 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=310, 
> ppid=308, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
> table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043}]
> 2018-03-05 11:29:21,534 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> procedure2.ProcedureExecutor(865): Stored pid=309, 
> state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure 
> table=Group_ns:testKillRS
> 2018-03-05 11:29:21,542 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:21,644 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:21,847 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:22,118 DEBUG [PEWorker-5] hbase.MetaTableAccessor(1944): Put 
> {"totalColumns":1,"row":"Group_ns:testKillRS","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1520249362117}]},"ts":1520249362117}
> 2018-03-05 11:29:22,123 INFO  [PEWorker-5] hbase.MetaTableAccessor(1646): 
> Updated table Group_ns:testKillRS state to DISABLING in META
> 2018-03-05 11:29:22,148 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:22,345 INFO  [PEWorker-5] 
> 

[jira] [Commented] (HBASE-20152) [AMv2] DisableTableProcedure versus ServerCrashProcedure

2018-03-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394716#comment-16394716
 ] 

Duo Zhang commented on HBASE-20152:
---

{quote}
Cluster shutdown is set. This means cluster down flag is set, we expire 
servers, but no ServerCrashProcedure gets scheduled.
{quote}

Just ask, after restarting we will re-schedule a SCP right?

> [AMv2] DisableTableProcedure versus ServerCrashProcedure
> 
>
> Key: HBASE-20152
> URL: https://issues.apache.org/jira/browse/HBASE-20152
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Seeing a small spate of issues where disabled tables/regions are being 
> assigned. Usually they happen when a DisableTableProcedure is running 
> concurrent with a ServerCrashProcedure. See below. See associated 
> HBASE-20131. This is umbrella issue for fixing.
> h3. Deadlock
> From HBASE-20137, 'TestRSGroups is Flakey', 
> https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
> {code}
>  * SCP is running because a server was aborted in test.
>  * SCP starts AssignProcedure of region X from crashed server.
>  * DisableTable Procedure runs because test has finished and we're doing 
> table delete. Queues 
>  * UnassignProcedure for region X.
>  * Disable Unassign gets Lock on region X first.
>  * SCP AssignProcedure tries to get lock, waits on lock.
>  * DisableTable Procedure UnassignProcedure RPC fails because server is down 
> (Thats why the SCP).
>  * Tries to expire the server it failed the RPC against. Fails (currently 
> being SCP'd).
>  * DisableTable Procedure Unassign is suspended. It is a suspend with lock on 
> region X held
>  * SCP can't run because lock on X is held
>  * Test timesout.
> {code}
> Here is the actual log from around the deadlock. pid=308 is the SCP. pid=309 
> is the disable table:
> {code}
> 2018-03-05 11:29:21,224 DEBUG [PEWorker-7] 
> procedure.ServerCrashProcedure(225): Done splitting WALs pid=308, 
> state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS; ServerCrashProcedure 
> server=1cfd208ff882,40584,1520249102524, splitWal=true, meta=false
> 2018-03-05 11:29:21,300 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> rsgroup.RSGroupAdminServer(371): Move server done: default=>appInfo
> 2018-03-05 11:29:21,307 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl(279): 
> Client=jenkins//172.17.0.2 list rsgroup
> 2018-03-05 11:29:21,312 INFO  [Time-limited test] client.HBaseAdmin$15(901): 
> Started disable of Group_ns:testKillRS
> 2018-03-05 11:29:21,313 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.HMaster$7(2278): Client=jenkins//172.17.0.2 disable Group_ns:testKillRS
> 2018-03-05 11:29:21,384 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=310, 
> ppid=308, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
> table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043}]
> 2018-03-05 11:29:21,534 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> procedure2.ProcedureExecutor(865): Stored pid=309, 
> state=RUNNABLE:DISABLE_TABLE_PREPARE; DisableTableProcedure 
> table=Group_ns:testKillRS
> 2018-03-05 11:29:21,542 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:21,644 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:21,847 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:22,118 DEBUG [PEWorker-5] hbase.MetaTableAccessor(1944): Put 
> {"totalColumns":1,"row":"Group_ns:testKillRS","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1520249362117}]},"ts":1520249362117}
> 2018-03-05 11:29:22,123 INFO  [PEWorker-5] hbase.MetaTableAccessor(1646): 
> Updated table Group_ns:testKillRS state to DISABLING in META
> 2018-03-05 11:29:22,148 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=38498] 
> master.MasterRpcServices(1134): Checking to see if procedure is done pid=309
> 2018-03-05 11:29:22,345 INFO  [PEWorker-5] 
> procedure2.ProcedureExecutor(1495): Initialized subprocedures=[{pid=311, 
> ppid=309, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=Group_ns:testKillRS, region=de7534c208a06502537cd95c248b3043, 
> server=1cfd208ff882,40584,1520249102524}]
> 2018-03-05 

[jira] [Commented] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

2018-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394695#comment-16394695
 ] 

Hadoop QA commented on HBASE-20173:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
35s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
40s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
26s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
13s{color} | {color:red} hbase-server: The patch generated 9 new + 85 unchanged 
- 6 fixed = 94 total (was 91) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 3s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
14m 58s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
1m 22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 
29s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
2s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
58s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db |
| JIRA Issue | HBASE-20173 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913944/HBASE-20173.branch-2.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  

[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Patch Available  (was: Open)

Testing a backport on 1.2.0 and works well. Passed the tests locally too. Looks 
done to me.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Attachment: HBASE-20105-v5.patch

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-20105:

Status: Open  (was: Patch Available)

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch, 
> HBASE-20105-v5.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20105) Allow flushes to target SSD storage

2018-03-11 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394684#comment-16394684
 ] 

Jean-Marc Spaggiari commented on HBASE-20105:
-

Did some small changes and tested it locally:
{code:java}
root@hbasetest1:~# hdfs fsck 
/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029
 -files -blocks -locations

Connecting to namenode via 
http://hbasetest1.distparser.com:50070/fsck?ugi=root=1=1=1=%2Fhbase%2Fdata%2Fdefault%2Ft1%2F1c925d870fcf663dd3f48d31bf2b98d8%2Ff1%2Ff22416c954df4e24b499e5fc707cb029
FSCK started by root (auth:SIMPLE) from /192.168.23.51 for path 
/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029
 at Sun Mar 11 18:44:49 EDT 2018
/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029
 4908 bytes, 1 block(s): OK
0. BP-2069742952-192.168.23.51-1431229364576:blk_1074774898_1034473 len=4908 
Live_repl=3 
[DatanodeInfoWithStorage[192.168.23.54:50010,DS-6c810995-115c-42cd-af32-c34f5095e45c,SSD],
 
DatanodeInfoWithStorage[192.168.23.52:50010,DS-d4e1790e-b7d3-492f-bb4f-7fb11b7ceff4,SSD],
 
DatanodeInfoWithStorage[192.168.23.53:50010,DS-04dac874-1eaf-43b5-ac1d-2768572b7a36,SSD]]

Status: HEALTHY
Total size: 4908 B
Total dirs: 0
Total files:1
Total symlinks: 0
Total blocks (validated):   1 (avg. block size 4908 B)
Minimally replicated blocks:1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks:0 (0.0 %)
Mis-replicated blocks:  0 (0.0 %)
Default replication factor: 3
Average block replication:  3.0
Corrupt blocks: 0
Missing replicas:   0 (0.0 %)
Number of data-nodes:   3
Number of racks:1
FSCK ended at Sun Mar 11 18:44:49 EDT 2018 in 0 milliseconds


The filesystem under path 
'/hbase/data/default/t1/1c925d870fcf663dd3f48d31bf2b98d8/f1/f22416c954df4e24b499e5fc707cb029'
 is HEALTHY
{code}

Sound like it works. Updated patch coming soon.

> Allow flushes to target SSD storage
> ---
>
> Key: HBASE-20105
> URL: https://issues.apache.org/jira/browse/HBASE-20105
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: hbase-2.0.0-alpha-4
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Major
> Attachments: HBASE-20105-v0.patch, HBASE-20105-v1.patch, 
> HBASE-20105-v2.patch, HBASE-20105-v3.patch, HBASE-20105-v4.patch
>
>
> On heavy writes usecases, flushes are compactes together pretty quickly. 
> Allowing flushes to go on SSD allows faster flush and faster first 
> compactions. Subsequent compactions going on regular storage.
>  
> I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

2018-03-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20173:
--
Status: Patch Available  (was: Open)

Trying hadoopqa. This one is hard to write a test for since it depdendent on 
aligning two macro procedure steps exactly.  My best bet I think is the test 
IntegrationTestDDLMasterFailover on a cluster. Will try it concurrent to this 
hadoopqa run.

> [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
> 
>
> Key: HBASE-20173
> URL: https://issues.apache.org/jira/browse/HBASE-20173
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20173.branch-2.001.patch
>
>
> See 'Deadlock' scenario in parent issue. Doing as focused subtask since 
> parent has a few things going on in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

2018-03-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20173:
--
Priority: Critical  (was: Major)

> [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
> 
>
> Key: HBASE-20173
> URL: https://issues.apache.org/jira/browse/HBASE-20173
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20173.branch-2.001.patch
>
>
> See 'Deadlock' scenario in parent issue. Doing as focused subtask since 
> parent has a few things going on in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

2018-03-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20173:
--
Attachment: HBASE-20173.branch-2.001.patch

> [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
> 
>
> Key: HBASE-20173
> URL: https://issues.apache.org/jira/browse/HBASE-20173
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20173.branch-2.001.patch
>
>
> See 'Deadlock' scenario in parent issue. Doing as focused subtask since 
> parent has a few things going on in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20173) [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

2018-03-11 Thread stack (JIRA)
stack created HBASE-20173:
-

 Summary: [AMv2] DisableTableProcedure concurrent to 
ServerCrashProcedure can deadlock
 Key: HBASE-20173
 URL: https://issues.apache.org/jira/browse/HBASE-20173
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Reporter: stack
Assignee: stack
 Fix For: 2.0.0


See 'Deadlock' scenario in parent issue. Doing as focused subtask since parent 
has a few things going on in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-11 Thread Saad Mufti (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394597#comment-16394597
 ] 

Saad Mufti commented on HBASE-20045:


Also if ever a setting is contemplated, it would be great to backport back to 
1.4.0 or earlier. Thanks.

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-11 Thread Saad Mufti (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587
 ] 

Saad Mufti edited comment on HBASE-20045 at 3/11/18 6:15 PM:
-

I hope I'm not interrupting the whole discussion, but I would like to describe 
a current use case for which this would be incredibly useful in my current line 
of work. We are using HBase on AWS EMR where the actual storage is all on S3 
using AWS's proprietary EMRFS filesystem. We have configured our bucket cache 
to be more than large enough to store all our data and more, but the S3 buys us 
failure recoverability and AWS EMR does not support more than a single master 
yet, so loss of the master means loss of the entire cluster and having our data 
in S3 lets us survive cluster failure cleanly. 

We have a heavy read and write load, and performance is more than good enough 
when everything is coming from the bucket cache (we have set the prefetch on 
open flag to true in the relevant column families' schema, except for one where 
we do heavy write but never read from it in the HBase cluster).

Now we come to compaction, both minor and major compaction. We have tuned minor 
compaction min and max filesize very low to avoid as much minor compaction as 
practical, and run a homegrown tool that does major compaction across the whole 
cluster in batches, with each batch being one region per region server. In past 
HBase clusters where storage was in HDFS, this served our needs well and we'd 
like to keep this tool for its operational flexibility and not having to take 
any downtime to compact. The problem of course is that compaction evicts from 
the bucket cache blocks from the files being compacted away, and also refuses 
to put blocks from the newly compacted file in the bucket cache. In the face of 
ongoing traffic that does a lot of checkAndPut, this causes the reads to go to 
S3 and be slow, causing the write lock for checkAndPut to be held for a long 
time, causing timeouts in other operations trying to get the same row lock. 
Also our overall performance suffers while the batch is going on due to 
requests building up in IPC queues. Response time means and other percentiles 
look like a sawtooth pattern. In our case each batch of compactions lasts 
roughly 10 minutes or so and the response time sawtooth pattern has the same 
periodicity.

For now we have worked around this by a) using the new setting in HBase 1.4.0 
in our client to avoid one slow region server from blocking all client ops to 
other region servers, b) accepting some timeouts as the cost of business and 
requeuing them in a special Kafka retry topic for our upstream system to 
reprocess. This also limits traffic on the slow region server, letting it clean 
out its backed up IPC queues instead of being hammered with traffic that 
doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 
hours, so performance is great the rest of the day.

But if we got a setting that would let us tell HBase to always cache even newly 
compacted files, our performance hit would totally go away. I see the argument 
above about not bothering to cache a block if all its cells are weeks old. In 
our case, the data is advertising identifiers and can come in unpredictably, 
and like I said we have a big enough bucket cache anyway, so why not just cache 
everything? The old blocks from the compacted away files are going to be 
evicted anyway, so we should never run out of bucket cache if we have sized it 
much larger than our entire data size.

Also of course the default behavior would continue, any new configuration 
around this can come with heavy warnings of the potential consequences and only 
being used by advanced users etc.

Again I apologize for this long description if it distracts from where the 
current state of the discussion was. Even a setting to only cache newly 
compacted blocks if they had "new" cells would still be hugely benficial to us.

Cheers.

 


was (Author: saadmufti):
I hope I'm not interrupting the whole discussion, but I would like to describe 
a current use case for which this would be incredibly useful in my current line 
of work. We are using HBase on AWS EMR where the actual storage is all on S3 
using AWS's proprietary EMRFS filesystem. We have configured our bucket cache 
to be more than large enough to store all our data and more, but the S3 buys us 
failure recoverability and AWS EMR does not support more than a single master 
yet, so loss of the master means loss of the entire cluster and having our data 
in S3 lets us survive cluster failure cleanly. 

We have a heavy read and write load, and performance is more than good enough 
when everything is coming from the bucket cache (we have set the prefetch on 
open flag to true in the relevant column families' schema, except for one where 
we do heavy write but never read 

[jira] [Comment Edited] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-11 Thread Saad Mufti (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587
 ] 

Saad Mufti edited comment on HBASE-20045 at 3/11/18 6:14 PM:
-

I hope I'm not interrupting the whole discussion, but I would like to describe 
a current use case for which this would be incredibly useful in my current line 
of work. We are using HBase on AWS EMR where the actual storage is all on S3 
using AWS's proprietary EMRFS filesystem. We have configured our bucket cache 
to be more than large enough to store all our data and more, but the S3 buys us 
failure recoverability and AWS EMR does not support more than a single master 
yet, so loss of the master means loss of the entire cluster and having our data 
in S3 lets us survive cluster failure cleanly. 

We have a heavy read and write load, and performance is more than good enough 
when everything is coming from the bucket cache (we have set the prefetch on 
open flag to true in the relevant column families' schema, except for one where 
we do heavy write but never read from it in the HBase cluster).

Now we come to compaction, both minor and major compaction. We have tuned minor 
compaction min and max filesize very low to avoid as much minor compaction as 
practical, and run a homegrown tool that does major compaction across the whole 
cluster in batches, with each batch being one region per region server. In past 
HBase clusters where storage was in HDFS, this served our needs well and we'd 
like to keep this tool for its operational flexibility and not having to take 
any downtime to compact. The problem of course is that compaction evicts from 
the bucket cache blocks from the files being compacted away. In the face of 
ongoing traffic that does a lot of checkAndPut, this causes the reads to go to 
S3 and be slow, causing the write lock for checkAndPut to be held for a long 
time, causing timeouts in other operations trying to get the same row lock. 
Also our overall performance suffers while the batch is going on due to 
requests building up in IPC queues. Response time means and other percentiles 
look like a sawtooth pattern. In our case each batch of compactions lasts 
roughly 10 minutes or so and the response time sawtooth pattern has the same 
periodicity.

For now we have worked around this by a) using the new setting in HBase 1.4.0 
in our client to avoid one slow region server from blocking all client ops to 
other region servers, b) accepting some timeouts as the cost of business and 
requeuing them in a special Kafka retry topic for our upstream system to 
reprocess. This also limits traffic on the slow region server, letting it clean 
out its backed up IPC queues instead of being hammered with traffic that 
doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 
hours, so performance is great the rest of the day.

But if we got a setting that would let us tell HBase to always cache even newly 
compacted files, our performance hit would totally go away. I see the argument 
above about not bothering to cache a block if all its cells are weeks old. In 
our case, the data is advertising identifiers and can come in unpredictably, 
and like I said we have a big enough bucket cache anyway, so why not just cache 
everything? The old blocks from the compacted away files are going to be 
evicted anyway, so we should never run out of bucket cache if we have sized it 
much larger than our entire data size.

Also of course the default behavior would continue, any new configuration 
around this can come with heavy warnings of the potential consequences and only 
being used by advanced users etc.

Again I apologize for this long description if it distracts from where the 
current state of the discussion was. Even a setting to only cache newly 
compacted blocks if they had "new" cells would still be hugely benficial to us.

Cheers.

 


was (Author: saadmufti):
I hope I'm not interrupting the whole discussion, but I would like to describe 
a current use case for which this would be incredibly useful in my current line 
of work. We are using HBase on AWS EMR where the actual storage is all on S3 
using AWS's proprietary EMRFS filesystem. We have configured our bucket cache 
to be more than large enough to store all our data and more, but the S3 buys us 
failure recoverability and AWS EMR does not support more than a single master 
yet, so loss of the master means loss of the entire cluster and having our data 
in S3 lets us survive cluster failure cleanly. 

We have a heavy read and write load, and performance is more than good enough 
when everything is coming from the bucket cache (we have set the prefetch on 
open flag to true in the relevant column families' schema, except for one where 
we do heavy write but never read from it in the HBase cluster).

Now we come to compaction, both minor and major 

[jira] [Comment Edited] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-11 Thread Saad Mufti (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587
 ] 

Saad Mufti edited comment on HBASE-20045 at 3/11/18 5:51 PM:
-

I hope I'm not interrupting the whole discussion, but I would like to describe 
a current use case for which this would be incredibly useful in my current line 
of work. We are using HBase on AWS EMR where the actual storage is all on S3 
using AWS's proprietary EMRFS filesystem. We have configured our bucket cache 
to be more than large enough to store all our data and more, but the S3 buys us 
failure recoverability and AWS EMR does not support more than a single master 
yet, so loss of the master means loss of the entire cluster and having our data 
in S3 lets us survive cluster failure cleanly. 

We have a heavy read and write load, and performance is more than good enough 
when everything is coming from the bucket cache (we have set the prefetch on 
open flag to true in the relevant column families' schema, except for one where 
we do heavy write but never read from it in the HBase cluster).

Now we come to compaction, both minor and major compaction. We have tuned minor 
compaction min and max filesize very slow to avoid as much minor compaction as 
practical, and run a homegrown tool that does major compaction across the whole 
cluster in batches, with each batch being one region per region server. In past 
HBase clusters where storage was in HDFS, this served our needs well and we'd 
like to keep this tool for its operational flexibility and not having to take 
any downtime to compact. The problem of course is that compaction evicts from 
the bucket cache blocks from the files being compacted away. In the face of 
ongoing traffic that does a lot of checkAndPut, this causes the reads to go to 
S3 and be slow, causing the write lock for checkAndPut to be held for a long 
time, causing timeouts in other operations trying to get the same row lock. 
Also our overall performance suffers while the batch is going on due to 
requests building up in IPC queues. Response time means and other percentiles 
look like a sawtooth pattern. In our case each batch of compactions lasts 
roughly 10 minutes or so and the response time sawtooth pattern has the same 
periodicity.

For now we have worked around this by a) using the new setting in HBase 1.4.0 
in our client to avoid one slow region server from blocking all client ops to 
other region servers, b) accepting some timeouts as the cost of business and 
requeuing them in a special Kafka retry topic for our upstream system to 
reprocess. This also limits traffic on the slow region server, letting it clean 
out its backed up IPC queues instead of being hammered with traffic that 
doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 
hours, so performance is great the rest of the day.

But if we got a setting that would let us tell HBase to always cache even newly 
compacted files, our performance hit would totally go away. I see the argument 
above about not bothering to cache a block if all its cells are weeks old. In 
our case, the data is advertising identifiers and can come in unpredictably, 
and like I said we have a big enough bucket cache anyway, so why not just cache 
everything? The old blocks from the compacted away files are going to be 
evicted anyway, so we should never run out of bucket cache if we have sized it 
much larger than our entire data size.

Also of course the default behavior would continue, any new configuration 
around this can come with heavy warnings of the potential consequences and only 
being used by advanced users etc.

Again I apologize for this long description if it distracts from where the 
current state of the discussion was. Even a setting to only cache newly 
compacted blocks if they had "new" cells would still be hugely benficial to us.

Cheers.

 


was (Author: saadmufti):
I hope I'm not interrupting the whole discussion, but I would like to describe 
a current use case for which this would be incredibly useful in my current line 
of work. We are using HBase on AWS EMR where the actual storage is all on S3 
using AWS's proprietary EMRFS filesystem. We have configured our bucket cache 
to be more than large enough to store all our data and more, but the S3 buys us 
failure recoverability and AWS EMR does not support more than a single master 
yet, so loss of the master means loss of the entire cluster and having our data 
in S3 lets us survive cluster failure cleanly. 

We have a heavy read and write load, and performance is more than good enough 
when everything is coming from the bucket cache (we have set the prefetch on 
open flag to true in the relevant column families' schema, except for one where 
we do heavy write but never read from it in the HBase cluster).

Now we come to compaction, both minor and major 

[jira] [Commented] (HBASE-20045) When running compaction, cache recent blocks.

2018-03-11 Thread Saad Mufti (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394587#comment-16394587
 ] 

Saad Mufti commented on HBASE-20045:


I hope I'm not interrupting the whole discussion, but I would like to describe 
a current use case for which this would be incredibly useful in my current line 
of work. We are using HBase on AWS EMR where the actual storage is all on S3 
using AWS's proprietary EMRFS filesystem. We have configured our bucket cache 
to be more than large enough to store all our data and more, but the S3 buys us 
failure recoverability and AWS EMR does not support more than a single master 
yet, so loss of the master means loss of the entire cluster and having our data 
in S3 lets us survive cluster failure cleanly. 

We have a heavy read and write load, and performance is more than good enough 
when everything is coming from the bucket cache (we have set the prefetch on 
open flag to true in the relevant column families' schema, except for one where 
we do heavy write but never read from it in the HBase cluster).

Now we come to compaction, both minor and major compaction. We have tuned minor 
compaction min and max filesize very slow to avoid as much minor compaction as 
practical, and run a homegrown tool that does major compaction across the whole 
cluster in batches, with each batch being one region per region server. In past 
HBase clusters where storage was in HDFS, this served our needs well and we'd 
like to keep this tool for its operational flexibility and not having to take 
any downtime to compact. The problem of course is that compaction evicts from 
the bucket cache blocks from the files being compacted away. In the face of 
ongoing traffic that does a lot of checkAndPut, this causes the reads to go to 
S3 and be slow, causing the write lock for checkAndPut to be held for a long 
time, causing timeouts in other operations trying to get the same row lock. 
Also our overall performance suffers while the batch is going on due to 
requests building up in IPC queues. Response time means and other percentiles 
look like a sawtooth pattern. In our case each batch of compactions lasts 
roughly 10 minutes or so and the response time sawtooth pattern has the same 
periodicity.

For now we have worked around this by a) using the new setting in HBase 1.4.0 
in our client to avoid one slow region server from blocking all client ops to 
other region servers, b) accepting some timeouts as the cost of business and 
requeuing them in a special Kafka retry topic for our upstream system to 
reprocess. This also limits traffic on the slow region server, letting it clean 
out its backed up IPC queues instead of being hammered with traffic that 
doesn't let it recover. Also we run the tool once a day and it finishes in 8-10 
hours, so performance is great the rest of the day.

But if we got a setting that would let us tell HBase to always cache even newly 
compacted files, our performance hit would totally go away. I see the argument 
above about not bothering to cache a block if all its cells are weeks old. In 
our case, the data is advertising identifiers and can come in unpredictably, 
and like I said we have a big enough bucket cache anyway, so why not just cache 
everything? The old blocks from the compacted away files are going to be 
evicted anyway, so we should never run out of bucket cache if we have sized it 
much larger than our entire data size.

Again I apologize for this long description if it distracts from where the 
current state of the discussion was. Even a setting to only cache newly 
compacted blocks if they had "new" cells would still be hugely benficial to us.

Cheers.

 

> When running compaction, cache recent blocks.
> -
>
> Key: HBASE-20045
> URL: https://issues.apache.org/jira/browse/HBASE-20045
> Project: HBase
>  Issue Type: New Feature
>  Components: BlockCache, Compaction
>Affects Versions: 2.0.0-beta-1
>Reporter: Jean-Marc Spaggiari
>Priority: Major
>
> HBase already allows to cache blocks on flush. This is very useful for 
> usecases where most queries are against recent data. However, as soon as 
> their is a compaction, those blocks are evicted. It will be interesting to 
> have a table level parameter to say "When compacting, cache blocks less than 
> 24 hours old". That way, when running compaction, all blocks where some data 
> are less than 24h hold, will be automatically cached. 
>  
> Very useful for table design where there is TS in the key but a long history 
> (Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.

2018-03-11 Thread Ankit Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394575#comment-16394575
 ] 

Ankit Singhal commented on HBASE-20172:
---

bq. is there a reason to do this in java8?
I believe setting current classloader to null and relying on Java to fallback 
to system classloader is not a good idea as every Java API would not be 
handling this consistently.

> During coprocessor load, switch classloader only if it's a custom CP.
> -
>
> Key: HBASE-20172
> URL: https://issues.apache.org/jira/browse/HBASE-20172
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20172.patch
>
>
> Current Impact:- 
> Metric registries will not be able to load its implementation through service 
> loader and etc.
> We are not observing this with Java8 because ServiceLoader uses system class 
> loader if provided class loader is null but it gets exposed with Java7 
> easily(TEPHRA-285)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.

2018-03-11 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394522#comment-16394522
 ] 

Sean Busbey commented on HBASE-20172:
-

we only do java7 on 1.y.z releases. could you make a patch for branch-1?

is there a reason to do this in java8?

> During coprocessor load, switch classloader only if it's a custom CP.
> -
>
> Key: HBASE-20172
> URL: https://issues.apache.org/jira/browse/HBASE-20172
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20172.patch
>
>
> Current Impact:- 
> Metric registries will not be able to load its implementation through service 
> loader and etc.
> We are not observing this with Java8 because ServiceLoader uses system class 
> loader if provided class loader is null but it gets exposed with Java7 
> easily(TEPHRA-285)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20171) Remove o.a.h.h.ProcedureState

2018-03-11 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai updated HBASE-20171:
---
Description: It was introduced by HBASE-15609, and HBASE-18106 make it be a 
orphan

> Remove o.a.h.h.ProcedureState
> -
>
> Key: HBASE-20171
> URL: https://issues.apache.org/jira/browse/HBASE-20171
> Project: HBase
>  Issue Type: Task
>Reporter: Chia-Ping Tsai
>Priority: Minor
>  Labels: beginner, beginners
> Fix For: 2.0.0, 3.0.0, 2.1.0
>
>
> It was introduced by HBASE-15609, and HBASE-18106 make it be a orphan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20171) Remove o.a.h.h.ProcedureState

2018-03-11 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai updated HBASE-20171:
---
Fix Version/s: 2.0.0

> Remove o.a.h.h.ProcedureState
> -
>
> Key: HBASE-20171
> URL: https://issues.apache.org/jira/browse/HBASE-20171
> Project: HBase
>  Issue Type: Task
> Environment: It was introduced by HBASE-15609, and HBASE-18106 make 
> it be a orphan
>Reporter: Chia-Ping Tsai
>Priority: Minor
>  Labels: beginner, beginners
> Fix For: 2.0.0, 3.0.0, 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20171) Remove o.a.h.h.ProcedureState

2018-03-11 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai updated HBASE-20171:
---
Environment: (was: It was introduced by HBASE-15609, and HBASE-18106 
make it be a orphan)

> Remove o.a.h.h.ProcedureState
> -
>
> Key: HBASE-20171
> URL: https://issues.apache.org/jira/browse/HBASE-20171
> Project: HBase
>  Issue Type: Task
>Reporter: Chia-Ping Tsai
>Priority: Minor
>  Labels: beginner, beginners
> Fix For: 2.0.0, 3.0.0, 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20171) Remove o.a.h.h.ProcedureState

2018-03-11 Thread Chia-Ping Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394509#comment-16394509
 ] 

Chia-Ping Tsai commented on HBASE-20171:


The class is notated as Public. If we want to remove it from 2.x, we must put 
the patch to branch-2.0 also.

> Remove o.a.h.h.ProcedureState
> -
>
> Key: HBASE-20171
> URL: https://issues.apache.org/jira/browse/HBASE-20171
> Project: HBase
>  Issue Type: Task
> Environment: It was introduced by HBASE-15609, and HBASE-18106 make 
> it be a orphan
>Reporter: Chia-Ping Tsai
>Priority: Minor
>  Labels: beginner, beginners
> Fix For: 2.0.0, 3.0.0, 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.

2018-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394506#comment-16394506
 ] 

Hadoop QA commented on HBASE-20172:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
38s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 51s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 27s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20172 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913929/HBASE-20172.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux bb2b73614e32 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d5aaeee88b |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11902/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11902/testReport/ |
| Max. process+thread count | 4925 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server 

[jira] [Commented] (HBASE-20133) Calculate correct assignment and build region movement plans for mis-placed regions in one pass

2018-03-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394493#comment-16394493
 ] 

Hudson commented on HBASE-20133:


Results for branch master
[build #259 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/259/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Calculate correct assignment and build region movement plans for mis-placed 
> regions in one pass
> ---
>
> Key: HBASE-20133
> URL: https://issues.apache.org/jira/browse/HBASE-20133
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-20133.master.000.patch, 
> HBASE-20133.master.001.patch, HBASE-20133.master.002.patch
>
>
> In RSGroupBasedLoadBalancer#balanceCluster(clusterState), the logic could be 
> improved:
> correctAssignment() builds a map for mis-placed and placed regions. For 
> mis-placed regions, the key(ServerName) is BOGUS_SERVER_NAME. Then the logic 
> gets those mis-paced regions out and calls findServerForRegion() several 
> times to find out the current host server, in order to build RegionPlan for 
> movement.
> Some logic in correctAssignment() and findServerForRegion() could be merged 
> so as to build both corrected assignment and RegionPlan for mis-placed region 
> in one pass. As a result, findServerForRegion() could be removed if I get it 
> correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19969) Improve fault tolerance in backup merge operation

2018-03-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394494#comment-16394494
 ] 

Hudson commented on HBASE-19969:


Results for branch master
[build #259 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/259/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/259//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Improve fault tolerance in backup merge operation
> -
>
> Key: HBASE-19969
> URL: https://issues.apache.org/jira/browse/HBASE-19969
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 19969-v4.patch, HBASE-19969-v1.patch, 
> HBASE-19969-v2.patch, HBASE-19969-v3.patch
>
>
> Some file system operations are not fault tolerant during merge. We delete 
> backup data in a backup file system, then copy new data over to backup 
> destination. Deletes can be partial, copy can fail as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2018-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394492#comment-16394492
 ] 

Hadoop QA commented on HBASE-19389:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
54s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
18s{color} | {color:red} hbase-server: The patch generated 5 new + 388 
unchanged - 1 fixed = 393 total (was 389) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
21m  1s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
46s{color} | {color:red} hbase-server generated 2 new + 0 unchanged - 0 fixed = 
2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
35s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 28s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  org.apache.hadoop.hbase.regionserver.HStore.add(Iterable, MemStoreSizing) 
does not release lock on all exception paths  At HStore.java:lock on all 
exception paths  At HStore.java:[line 728] |
|  |  org.apache.hadoop.hbase.regionserver.HStore.add(Cell, MemStoreSizing) 
does not release lock on all exception paths  At HStore.java:lock on all 
exception paths  At HStore.java:[line 709] |
| Failed junit tests | 
hadoop.hbase.regionserver.throttle.TestStoreHotnessProtector |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19389 |
| JIRA Patch URL | 

[jira] [Commented] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2018-03-11 Thread Chance Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394484#comment-16394484
 ] 

Chance Li commented on HBASE-19389:
---

update patch HBASE-19389-branch-2-V8.patch, add HBASE-19389.master.patch for 
master.

Fixed an issue when handling the STORE_TOO_BUSY code  or a atomic request,.

> Limit concurrency of put with dense (hundreds) columns to prevent write 
> handler exhausted
> -
>
> Key: HBASE-19389
> URL: https://issues.apache.org/jira/browse/HBASE-19389
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 2.0.0
> Environment: 2000+ Region Servers
> PCI-E ssd
>Reporter: Chance Li
>Assignee: Chance Li
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: CSLM-concurrent-write.png, 
> HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, 
> HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, 
> HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, 
> HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, 
> HBASE-19389-branch-2.patch, HBASE-19389.master.patch, metrics-1.png, 
> ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's 
> handlers are all busy sometimes. And after investigation we found the root 
> cause is about CSLM, such as compare function heavy load. We reviewed the 
> related WALs, and found that there were many columns (more than 1000 columns) 
> were writing at that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2018-03-11 Thread Chance Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chance Li updated HBASE-19389:
--
Attachment: HBASE-19389.master.patch

> Limit concurrency of put with dense (hundreds) columns to prevent write 
> handler exhausted
> -
>
> Key: HBASE-19389
> URL: https://issues.apache.org/jira/browse/HBASE-19389
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 2.0.0
> Environment: 2000+ Region Servers
> PCI-E ssd
>Reporter: Chance Li
>Assignee: Chance Li
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: CSLM-concurrent-write.png, 
> HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, 
> HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, 
> HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, 
> HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, 
> HBASE-19389-branch-2.patch, HBASE-19389.master.patch, metrics-1.png, 
> ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's 
> handlers are all busy sometimes. And after investigation we found the root 
> cause is about CSLM, such as compare function heavy load. We reviewed the 
> related WALs, and found that there were many columns (more than 1000 columns) 
> were writing at that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19389) Limit concurrency of put with dense (hundreds) columns to prevent write handler exhausted

2018-03-11 Thread Chance Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chance Li updated HBASE-19389:
--
Attachment: HBASE-19389-branch-2-V9.patch

> Limit concurrency of put with dense (hundreds) columns to prevent write 
> handler exhausted
> -
>
> Key: HBASE-19389
> URL: https://issues.apache.org/jira/browse/HBASE-19389
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 2.0.0
> Environment: 2000+ Region Servers
> PCI-E ssd
>Reporter: Chance Li
>Assignee: Chance Li
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: CSLM-concurrent-write.png, 
> HBASE-19389-branch-2-V2.patch, HBASE-19389-branch-2-V3.patch, 
> HBASE-19389-branch-2-V4.patch, HBASE-19389-branch-2-V5.patch, 
> HBASE-19389-branch-2-V6.patch, HBASE-19389-branch-2-V7.patch, 
> HBASE-19389-branch-2-V8.patch, HBASE-19389-branch-2-V9.patch, 
> HBASE-19389-branch-2.patch, HBASE-19389.master.patch, metrics-1.png, 
> ycsb-result.png
>
>
> In a large cluster, with a large number of clients, we found the RS's 
> handlers are all busy sometimes. And after investigation we found the root 
> cause is about CSLM, such as compare function heavy load. We reviewed the 
> related WALs, and found that there were many columns (more than 1000 columns) 
> were writing at that time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.

2018-03-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20172:
---
Status: Patch Available  (was: Open)

> During coprocessor load, switch classloader only if it's a custom CP.
> -
>
> Key: HBASE-20172
> URL: https://issues.apache.org/jira/browse/HBASE-20172
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20172.patch
>
>
> Current Impact:- 
> Metric registries will not be able to load its implementation through service 
> loader and etc.
> We are not observing this with Java8 because ServiceLoader uses system class 
> loader if provided class loader is null but it gets exposed with Java7 
> easily(TEPHRA-285)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.

2018-03-11 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20172:
--
Attachment: HBASE-20172.patch

> During coprocessor load, switch classloader only if it's a custom CP.
> -
>
> Key: HBASE-20172
> URL: https://issues.apache.org/jira/browse/HBASE-20172
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20172.patch
>
>
> Current Impact:- 
> Metric registries will not be able to load its implementation through service 
> loader and etc.
> We are not observing this with Java8 because ServiceLoader uses system class 
> loader if provided class loader is null but it gets exposed with Java7 
> easily(TEPHRA-285)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20172) During coprocessor load, switch classloader only if it's a custom CP.

2018-03-11 Thread Ankit Singhal (JIRA)
Ankit Singhal created HBASE-20172:
-

 Summary: During coprocessor load, switch classloader only if it's 
a custom CP.
 Key: HBASE-20172
 URL: https://issues.apache.org/jira/browse/HBASE-20172
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Ankit Singhal
Assignee: Ankit Singhal
 Fix For: 2.0.0


Current Impact:- 
Metric registries will not be able to load its implementation through service 
loader and etc.

We are not observing this with Java8 because ServiceLoader uses system class 
loader if provided class loader is null but it gets exposed with Java7 
easily(TEPHRA-285)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20165) Shell command to make a normal peer to be a serial replication peer

2018-03-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394457#comment-16394457
 ] 

Duo Zhang commented on HBASE-20165:
---

+1. To be honest I never understand the violation of rubocop and ruby-lint...

> Shell command to make a normal peer to be a serial replication peer
> ---
>
> Key: HBASE-20165
> URL: https://issues.apache.org/jira/browse/HBASE-20165
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-20165.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader

2018-03-11 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394447#comment-16394447
 ] 

Duo Zhang commented on HBASE-20167:
---

OK, this is a refactoring so no new tests, others are all green. Any concerns? 
[~zghaobac] [~openinx]? I think this patch can make our code much cleaner.

Thanks.

> Optimize the implementation of ReplicationSourceWALReader
> -
>
> Key: HBASE-20167
> URL: https://issues.apache.org/jira/browse/HBASE-20167
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20167-v1.patch, HBASE-20167.patch
>
>
> After HBASE-20148, serial replication will be an option for peer. Since an 
> instance of ReplicationSourceWALReader can only belongs to one peer, we do 
> not need to add the so many 'if' in the implementation of readWALEntries to 
> check whether we should consider serial replication. We can just make a sub 
> class or something similiar for serial replication to make the code clean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20167) Optimize the implementation of ReplicationSourceWALReader

2018-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394445#comment-16394445
 ] 

Hadoop QA commented on HBASE-20167:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
57s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} hbase-server: The patch generated 0 new + 0 
unchanged - 1 fixed = 0 total (was 1) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
46s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
18m 37s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 
10s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-20167 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913922/HBASE-20167-v1.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 7c2b35ec7db0 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d5aaeee88b |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11901/testReport/ |
| Max. process+thread count | 4542 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output |