[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

2016-11-04 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638727#comment-15638727
 ] 

Joep Rottinghuis commented on HBASE-17018:
--

Thanks for the comments.

My thought around using MR were because of easy of implementation and stemmed 
from my use case where Yarn is present and therefore MR trivially available. It 
is a fair point that as a standalone feature in HBase this doesn't have to be 
true. Using MR isn't a requirement, but was merely a (naive) suggestion.

I don't think that atomicity is a requirement, nor are we asking for 
"guarantees".
If you want to be guaranteed to write something to HBase you probably shouldn't 
use a BufferedMutator in the first place.

Please see attached PDF where I try to sketch out our use case and what 
behavior we're hoping to see.



> Spooling BufferedMutator
> 
>
> Key: HBASE-17018
> URL: https://issues.apache.org/jira/browse/HBASE-17018
> Project: HBase
>  Issue Type: New Feature
>Reporter: Joep Rottinghuis
> Attachments: YARN-4061 HBase requirements for fault tolerant 
> writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638675#comment-15638675
 ] 

stack commented on HBASE-16890:
---

Smile. No worries. I'll try. 

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638647#comment-15638647
 ] 

Duo Zhang commented on HBASE-16890:
---

Honestly I do not know... I have never changed it before. You can try a 25 and 
a 75 to see if there are some difference.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638648#comment-15638648
 ] 

Hadoop QA commented on HBASE-17033:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 38s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 155m 2s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 195m 28s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestHRegion |
| Timed out junit tests | org.apache.hadoop.hbase.TestGlobalMemStoreSize |
|   | 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes |
|   | 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelReplicationWithExpAsString
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837335/hbase-17033_v1.patch |
| JIRA Issue | HBASE-17033 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 812ec33d27d5 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7e05d0f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4345/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/4345/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4345/testReport/ |
| modules | 

[jira] [Commented] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638616#comment-15638616
 ] 

stack commented on HBASE-17021:
---

I did a part-pass. Will do another later. [~ram_krish] You looking at this. 
Whats diff between this and your aproach boss? Thanks.

> Use RingBuffer to reduce the contention in AsyncFSWAL
> -
>
> Key: HBASE-17021
> URL: https://issues.apache.org/jira/browse/HBASE-17021
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17021.patch
>
>
> The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can 
> get a better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638594#comment-15638594
 ] 

stack commented on HBASE-16890:
---

Takes an int. It defaults 50. You want it 100? [~Apache9]

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638548#comment-15638548
 ] 

stack commented on HBASE-16890:
---

Let me try. Will report back in morning.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638495#comment-15638495
 ] 

Hadoop QA commented on HBASE-17032:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 46s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
58s {color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} branch-1.3 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} branch-1.3 passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
52s {color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
47s {color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} branch-1.3 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} branch-1.3 passed with JDK v1.7.0_80 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
16m 0s {color} | {color:green} The patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_80 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 32s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 3s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
37s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 13s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.security.access.TestAccessController |
|   | org.apache.hadoop.hbase.tool.TestCanaryTool |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.1 Server=1.12.1 Image:yetus/hbase:463e832 |
| JIRA Patch 

[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638480#comment-15638480
 ] 

Duo Zhang commented on HBASE-16890:
---

And for ioRatio, you need to cast the EventLoopGroup in AsyncFSWALProvider to 
NioEventLoopGroup and call its setIoRatio method. [~stack]

It should be in (0, 100] which means the percentage of the time which the 
EventLoop will spend on doing io.

Thanks.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638455#comment-15638455
 ] 

Duo Zhang edited comment on HBASE-17021 at 11/5/16 3:13 AM:


[~stack] If you can also make sure that this patch helps, then let's commit it 
first?

Then I could work on the following part such as limit the concurrent sync 
requests. I do not want to put everything in a single big patch as we do not 
know if the newly added code works...

Thanks.


was (Author: apache9):
[~stack] If you can also make sure that this patch helps, then let's commit it 
first?

Then I could work on the following part such as limit the concurrent sync 
requests. I do not want to put everything in a single big patch...

Thanks.

> Use RingBuffer to reduce the contention in AsyncFSWAL
> -
>
> Key: HBASE-17021
> URL: https://issues.apache.org/jira/browse/HBASE-17021
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17021.patch
>
>
> The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can 
> get a better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638455#comment-15638455
 ] 

Duo Zhang commented on HBASE-17021:
---

[~stack] If you can also make sure that this patch helps, then let's commit it 
first?

Then I could work on the following part such as limit the concurrent sync 
requests. I do not want to put everything in a single big patch...

Thanks.

> Use RingBuffer to reduce the contention in AsyncFSWAL
> -
>
> Key: HBASE-17021
> URL: https://issues.apache.org/jira/browse/HBASE-17021
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17021.patch
>
>
> The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can 
> get a better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638414#comment-15638414
 ] 

Duo Zhang commented on HBASE-16890:
---

The sync request of AsyncFSWAL is asynchronous so theoretically we could issue 
a sync for every append if the consumer task runs quickly enough...

Anyway, let me try to limit the pending sync count to see if it helps for you 
as I can not observe the same result...

Thanks.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638395#comment-15638395
 ] 

stack commented on HBASE-16890:
---

I'd think that asyncwal would aggregate more than the five threads FSHLog has 
running? I'd think the five threads would keep stamping on each other making 
smaller Packets than AsyncWAL is capable of and therefore would aggregate less 
than it.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638379#comment-15638379
 ] 

Duo Zhang commented on HBASE-17021:
---

At least one of the problems...

> Use RingBuffer to reduce the contention in AsyncFSWAL
> -
>
> Key: HBASE-17021
> URL: https://issues.apache.org/jira/browse/HBASE-17021
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17021.patch
>
>
> The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can 
> get a better performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638369#comment-15638369
 ] 

Duo Zhang commented on HBASE-16890:
---

If we have more sync request for AsyncFSWAL then no doubt FSHLog does better on 
aggregating and I think it is possble.

We have five threads do syncing for FSHLog, so the most number of pending sync 
request will be five. If we reach the number then we are forced to do 
aggregating. But for AsyncFSWAL, there is no such limitation. Maybe we could 
also introduce a limit for AsyncFSWAL. Let me have a try.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638347#comment-15638347
 ] 

Hudson commented on HBASE-17004:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #69 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/69/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev b1c17f0ef98c1c6674004f044b3160b1be37ca64)
* (edit) hbase-it/pom.xml
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16982) Better integrate Apache CLI in AbstractHBaseTool

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638360#comment-15638360
 ] 

Hadoop QA commented on HBASE-16982:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 30m 44s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 40s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
55s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 21s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
32s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 
27s {color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
54s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 27s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
33m 49s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s 
{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 116m 53s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 128m 42s 
{color} | {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
48s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 358m 5s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.regionserver.wal.TestAsyncWALReplay |
|   | org.apache.hadoop.hbase.regionserver.wal.TestAsyncLogRolling |
|   | org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster |
|   | org.apache.hadoop.hbase.regionserver.wal.TestAsyncWALReplayCompressed |
|   | org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover |
|   | org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster |
|   | 

[jira] [Commented] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638355#comment-15638355
 ] 

Hadoop QA commented on HBASE-17030:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
19s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 7m 
54s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 17s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 7m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
32m 14s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 7s 
{color} | {color:red} hbase-protocol-shaded in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 8s {color} | 
{color:red} hbase-protocol-shaded in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 22s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 34s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.master.procedure.TestModifyTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestRestoreSnapshotProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837322/HBASE-17030-v0.patch |
| JIRA Issue | HBASE-17030 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  cc  

[jira] [Updated] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17033:
--
Attachment: hbase-17033_v1.patch

Simple patch reduces these kind of allocations. 

> LogRoller makes a lot of allocations unnecessarily
> --
>
> Key: HBASE-17033
> URL: https://issues.apache.org/jira/browse/HBASE-17033
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: Screen Shot 2016-11-04 at 6.39.00 PM.png, 
> hbase-17033_v1.patch
>
>
> I was looking at the other allocations for HBASE-17017. Seems that log roller 
> thread allocates 200MB for ~7% of the TLAB space. This is a lot of 
> allocations. 
> I think the reason is this: 
> {code}
> while (true) {
> if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) {
>   break;
> }
> if (syncFuture.isThrowable()) {
>   throw new 
> FailedSyncBeforeLogCloseException(syncFuture.getThrowable());
> }
>   }
> {code}
> This busy wait is causing a lot allocations because the thread is added to 
> the waiting list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17033:
--
Status: Patch Available  (was: Open)

> LogRoller makes a lot of allocations unnecessarily
> --
>
> Key: HBASE-17033
> URL: https://issues.apache.org/jira/browse/HBASE-17033
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: Screen Shot 2016-11-04 at 6.39.00 PM.png, 
> hbase-17033_v1.patch
>
>
> I was looking at the other allocations for HBASE-17017. Seems that log roller 
> thread allocates 200MB for ~7% of the TLAB space. This is a lot of 
> allocations. 
> I think the reason is this: 
> {code}
> while (true) {
> if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) {
>   break;
> }
> if (syncFuture.isThrowable()) {
>   throw new 
> FailedSyncBeforeLogCloseException(syncFuture.getThrowable());
> }
>   }
> {code}
> This busy wait is causing a lot allocations because the thread is added to 
> the waiting list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638326#comment-15638326
 ] 

Hadoop QA commented on HBASE-17017:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
25s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
7s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
31m 16s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s 
{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 103m 8s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
36s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 153m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestRegionServerMetrics |
|   | hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush |
| Timed out junit tests | 
org.apache.hadoop.hbase.security.access.TestAccessController2 |
|   | org.apache.hadoop.hbase.TestMovedRegionsCleaner |
|   | org.apache.hadoop.hbase.security.access.TestCellACLWithMultipleVersions |
|   | org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization |
|   | org.apache.hadoop.hbase.security.access.TestAccessController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837314/hbase-17017_v1.patch |
| JIRA Issue | HBASE-17017 |
| Optional Tests |  asflicense  javac  

[jira] [Updated] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17033:
--
Attachment: Screen Shot 2016-11-04 at 6.39.00 PM.png

Screenshot. 

> LogRoller makes a lot of allocations unnecessarily
> --
>
> Key: HBASE-17033
> URL: https://issues.apache.org/jira/browse/HBASE-17033
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: Screen Shot 2016-11-04 at 6.39.00 PM.png
>
>
> I was looking at the other allocations for HBASE-17017. Seems that log roller 
> thread allocates 200MB for ~7% of the TLAB space. This is a lot of 
> allocations. 
> I think the reason is this: 
> {code}
> while (true) {
> if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) {
>   break;
> }
> if (syncFuture.isThrowable()) {
>   throw new 
> FailedSyncBeforeLogCloseException(syncFuture.getThrowable());
> }
>   }
> {code}
> This busy wait is causing a lot allocations because the thread is added to 
> the waiting list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638307#comment-15638307
 ] 

Hudson commented on HBASE-17004:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #57 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/57/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev 804ce850030f607acf855876223d5fa7b3825d0a)
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java
* (edit) hbase-it/pom.xml


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily

2016-11-04 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-17033:
-

 Summary: LogRoller makes a lot of allocations unnecessarily
 Key: HBASE-17033
 URL: https://issues.apache.org/jira/browse/HBASE-17033
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar


I was looking at the other allocations for HBASE-17017. Seems that log roller 
thread allocates 200MB for ~7% of the TLAB space. This is a lot of allocations. 

I think the reason is this: 
{code}
while (true) {
if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) {
  break;
}
if (syncFuture.isThrowable()) {
  throw new 
FailedSyncBeforeLogCloseException(syncFuture.getThrowable());
}
  }
{code}

This busy wait is causing a lot allocations because the thread is added to the 
waiting list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-17032:

Status: Patch Available  (was: Open)

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-17032.branch-1.3.v1.patch, 
> HBASE-17032.branch-1.3.v2.patch
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-17032:

Attachment: HBASE-17032.branch-1.3.v2.patch

v2 patch with fixed test

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-17032.branch-1.3.v1.patch, 
> HBASE-17032.branch-1.3.v2.patch
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638213#comment-15638213
 ] 

stack commented on HBASE-16890:
---

48core.

bq. Seems the problem is AsyncFSWAL can not use more CPUs even if there is no 
contention

How is this bottlenecking us? The ringbuffer consumer is a single thread in 
both cases? Then in DFSClient it goes into a Q consumed by one thread. AsyncWAL 
should still be blowing FSHLog away

Yeah, tell me about ioratio. I'm gone for an hour but will be back on. Can run 
anything you like.  See above for stats on FSHLog doing better aggregating 
syncs.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638208#comment-15638208
 ] 

stack commented on HBASE-16890:
---

We should try and get metrics on packet sizes. FSHLog is making fatter packets?

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638205#comment-15638205
 ] 

Duo Zhang commented on HBASE-16890:
---

So what's the hardware of your machine [~stack] ? Seems the problem is 
AsyncFSWAL can not use more CPUs even if there is no contention? Maybe you 
could try increase/decrease ioRatio of netty to see if the result changes? Let 
me find the way of changing ioRatio for netty.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638206#comment-15638206
 ] 

stack commented on HBASE-16890:
---

{code}

-- Histograms --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
 count = 2101170
   min = 460988
   max = 179884304
  mean = 5793567.54
stddev = 19665482.79
median = 2129343.00
  75% <= 2639978.00
  95% <= 7591455.00
  98% <= 106766212.00
  99% <= 120363544.00
99.9% <= 179884304.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
 count = 283081
   min = 0
   max = 16
  mean = 6.46
stddev = 4.28
median = 8.00
  75% <= 10.00
  95% <= 12.00
  98% <= 13.00
  99% <= 14.00
99.9% <= 16.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
 count = 283083
   min = 747201
   max = 179577999
  mean = 5808497.64
stddev = 20550849.80
median = 1919769.00
  75% <= 2594564.00
  95% <= 6725774.00
  98% <= 104668538.00
  99% <= 126351306.00
99.9% <= 179577999.00

-- Meters --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
 count = 14690526510
 mean rate = 69331875.36 events/second
 1-minute rate = 40171021.67 events/second
 5-minute rate = 73866875.49 events/second
15-minute rate = 83653584.79 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
 count = 283104
 mean rate = 1336.08 events/second
 1-minute rate = 780.51 events/second
 5-minute rate = 1324.92 events/second
15-minute rate = 1452.57 events/second
{code}

Looks like FSHLog is aggregating more syncs per actual sync 21 vs 6.5

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638188#comment-15638188
 ] 

Mikhail Antonov commented on HBASE-17032:
-

seems like it'd break TestFastFail. Will update the patch soon.

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-17032.branch-1.3.v1.patch
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-16890:
--
Attachment: Screen Shot 2016-11-04 at 5.30.18 PM.png
Screen Shot 2016-11-04 at 5.21.27 PM.png

The methods that consumer .5% or greater

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at 
> 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, 
> classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638184#comment-15638184
 ] 

stack commented on HBASE-16890:
---

I ran the tests a few times and results consistent. Looking in FSHLog run w/ 
JFR, I see more points of contention reported -- inside DFSClient. It uses 
maybe 25% more CPU probably because of the upped throughput.  Otherwise, 
looking w/ JFR nothing jumps out. Let me put up pictures of the 'hot methods' 
It is almost as though FSHLog is doing more work (The top consumers are the 
WALPE random generation... we should fix that).

The FSHLog must have a better 'flow' going on. Here is histograms for FSHLog:

{code}

-- Histograms --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos
 count = 8461245
   min = 838241
   max = 115799121
  mean = 2696785.63
stddev = 6486391.73
median = 2199081.00
  75% <= 2571547.00
  95% <= 3237948.00
  98% <= 3621166.00
  99% <= 5216818.00
99.9% <= 115799121.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync
 count = 412764
   min = 1
   max = 86
  mean = 21.04
stddev = 16.98
median = 17.00
  75% <= 34.00
  95% <= 53.00
  98% <= 58.00
  99% <= 62.00
99.9% <= 86.00
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs
 count = 412764
   min = 405379
   max = 129879546
  mean = 1680258.91
stddev = 7343616.88
median = 1127074.00
  75% <= 1448611.00
  95% <= 1812916.00
  98% <= 1978098.00
  99% <= 2150048.00
99.9% <= 122766311.00

-- Meters --
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes
 count = 59144801550
 mean rate = 244727411.22 events/second
 1-minute rate = 245882558.80 events/second
 5-minute rate = 199668915.99 events/second
15-minute rate = 166822622.37 events/second
org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs
 count = 412764
 mean rate = 1707.90 events/second
 1-minute rate = 1715.17 events/second
 5-minute rate = 1342.77 events/second
15-minute rate = 1077.71 events/second
{code}

Let me get them for asyncwal...

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, async.svg, classic.svg, 
> contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-17032:

Attachment: HBASE-17032.branch-1.3.v1.patch

trivial patch

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-17032.branch-1.3.v1.patch
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov reassigned HBASE-17032:
---

Assignee: Mikhail Antonov

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-17032:

Fix Version/s: 1.3.0
   2.0.0

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-17032:

Affects Version/s: 1.3.0

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
> Fix For: 2.0.0, 1.3.0
>
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-17032:

Component/s: Client

> CallQueueTooBigException and CallDroppedException should not be triggering 
> PFFE
> ---
>
> Key: HBASE-17032
> URL: https://issues.apache.org/jira/browse/HBASE-17032
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Reporter: Mikhail Antonov
>
> Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
> exception on the client. 
> It seems those 2 load control mechanists don't exactly align here. Server 
> throws CallQueueTooBigException, CallDroppedException (from deadline 
> scheduler) when it feels overloaded. Client should accept that behavior and 
> retry. When servers sheds the load, and client also bails out, the load 
> shedding  bubbles up too high and high level impact on the client 
> applications seems worse with PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE

2016-11-04 Thread Mikhail Antonov (JIRA)
Mikhail Antonov created HBASE-17032:
---

 Summary: CallQueueTooBigException and CallDroppedException should 
not be triggering PFFE
 Key: HBASE-17032
 URL: https://issues.apache.org/jira/browse/HBASE-17032
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Antonov


Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail 
exception on the client. 

It seems those 2 load control mechanists don't exactly align here. Server 
throws CallQueueTooBigException, CallDroppedException (from deadline scheduler) 
when it feels overloaded. Client should accept that behavior and retry. When 
servers sheds the load, and client also bails out, the load shedding  bubbles 
up too high and high level impact on the client applications seems worse with 
PFFE turned on then without.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638149#comment-15638149
 ] 

Gary Helmling commented on HBASE-17017:
---

The Counter metrics are much less expensive (1 Counter instance vs 260 
instances per histogram).  And they're useful for identifying hot regions, so I 
think we should keep those around.  In theory the size histograms could also be 
useful for that, but I can't say I've used them much.  So dumping the time and 
size histograms seems okay to me.

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-17022:

   Resolution: Fixed
Fix Version/s: 1.1.8
   Status: Resolved  (was: Patch Available)

> TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in 
> branch-1.1
> 
>
> Key: HBASE-17022
> URL: https://issues.apache.org/jira/browse/HBASE-17022
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.7
>Reporter: Yu Li
>Assignee: Matteo Bertozzi
> Fix For: 1.1.8
>
> Attachments: HBASE-17022-v0.branch-1.1.patch, 
> HBASE-17022-v0_branch-1.1.patch
>
>
> As titled, checking recent pre-commit UT of branch-1.1 we could find 
> {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638140#comment-15638140
 ] 

stack commented on HBASE-16890:
---

I ran WALPE w/ log roll disabled against a single, remote DN. I see that FSHLog 
is 2x AsyncWAL even w/  HBASE-17021 patch in place.

FSHLog, Default Master Branch
{code}
2016-11-04 16:09:05,210 INFO  [main] wal.WALPerformanceEvaluation: Summary: 
threads=100, iterations=10, syncInterval=0 took 269.595s 37092.676ops/s
 Performance counter stats for './hbase/bin/hbase --config 
/home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation 
-threads 100 -iterations 10 -qualifiers 25 -keySize 50 -valueSize 200':

2970796.406680 task-clock (msec) #   10.831 CPUs utilized
19,589,972 context-switches  #0.007 M/sec
 2,862,328 cpu-migrations#0.963 K/sec
 7,026,111 page-faults   #0.002 M/sec
 5,189,096,974,913 cycles#1.747 GHz
stalled-cycles-frontend
stalled-cycles-backend
 2,899,414,852,894 instructions  #0.56  insns per cycle
   472,244,057,677 branches  #  158.962 M/sec
 4,717,852,912 branch-misses #1.00% of all branches

 274.288161881 seconds time elapsed
{code}

Current State of AsyncFSWAL in master branch
{code}
2016-11-04 16:19:01,247 INFO  [main] wal.WALPerformanceEvaluation: Summary: 
threads=100, iterations=10, syncInterval=0 took 541.682s 18461.016ops/s
 Performance counter stats for './hbase/bin/hbase --config 
/home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation 
-threads 100 -iterations 10 -qualifiers 25 -keySize 50 -valueSize 200':

3032840.986653 task-clock (msec) #5.484 CPUs utilized
15,400,858 context-switches  #0.005 M/sec
 3,205,052 cpu-migrations#0.001 M/sec
12,901,416 page-faults   #0.004 M/sec
 5,212,559,898,743 cycles#1.719 GHz
stalled-cycles-frontend
stalled-cycles-backend
 2,676,707,056,681 instructions  #0.51  insns per cycle
   445,557,848,140 branches  #  146.911 M/sec
 6,372,744,336 branch-misses #1.43% of all branches

 553.074446643 seconds time elapsed
{code}

Patched AsyncWAL
{code}
2016-11-04 16:36:12,872 INFO  [main] wal.WALPerformanceEvaluation: Summary: 
threads=100, iterations=10, syncInterval=0 took 449.542s 22244.863ops/s

 Performance counter stats for './hbase/bin/hbase --config 
/home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation 
-threads 100 -iterations 10 -qualifiers 25 -keySize 50 -valueSize 200':

2847554.990457 task-clock (msec) #6.151 CPUs utilized
11,158,364 context-switches  #0.004 M/sec
 1,697,560 cpu-migrations#0.596 K/sec
 8,239,210 page-faults   #0.003 M/sec
 5,082,916,581,506 cycles#1.785 GHz
stalled-cycles-frontend
stalled-cycles-backend
 2,443,254,158,990 instructions  #0.48  insns per cycle
   392,726,539,853 branches  #  137.917 M/sec
 5,782,766,858 branch-misses #1.47% of all branches

 462.937995983 seconds time elapsed
{code}

Looking in flight recorder, I don't see any contention reported any more w/ the 
patched asyncwal so that is good.

> Analyze the performance of AsyncWAL and fix the same
> 
>
> Key: HBASE-16890
> URL: https://issues.apache.org/jira/browse/HBASE-16890
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 
> (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, 
> AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, 
> HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, 
> HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, 
> Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 
> PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, async.svg, classic.svg, 
> contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638131#comment-15638131
 ] 

Andrew Purtell commented on HBASE-17017:


/cc [~mantonov] [~ghelmling] [~eclark] 

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics

2016-11-04 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638133#comment-15638133
 ] 

Mikhail Antonov commented on HBASE-17016:
-

[~enis] yeah, +1 to that approach.

> Reimplement per-region latency histogram metrics
> 
>
> Key: HBASE-17016
> URL: https://issues.apache.org/jira/browse/HBASE-17016
> Project: HBase
>  Issue Type: Task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
>
> Follow up from HBASE-10656, where [~enis] says:
> {quote}
> the main problem is that we have A LOT of per-region metrics that are latency 
> histograms. These latency histograms create many many Counter / LongAdder 
> objects. We should get rid of per-region latencies and maybe look at reducing 
> the per-region metric overhead.
> {quote}
> And [~ghelmling] gives us a good candidate to implement pre-region latency 
> histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram].
> Let's consider removing the per-region latency histograms and reimplement 
> using HdrHistogram.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17031) Scanners should check for null start and end rows

2016-11-04 Thread Ashu Pachauri (JIRA)
Ashu Pachauri created HBASE-17031:
-

 Summary: Scanners should check for null start and end rows
 Key: HBASE-17031
 URL: https://issues.apache.org/jira/browse/HBASE-17031
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Reporter: Ashu Pachauri
Priority: Minor


If a scan is passed with a null start row, it fails very deep in the call 
stack. We should validate start and end rows for not null before launching the 
scan.
Here is the associated jstack:

{code}
java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at 
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
at 
org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:798)

Caused by: java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1225)
at 
org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:158)
at 
org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:147)
at 
org.apache.hadoop.hbase.types.CopyOnWriteArrayMap$ArrayHolder.find(CopyOnWriteArrayMap.java:892)
at 
org.apache.hadoop.hbase.types.CopyOnWriteArrayMap.floorEntry(CopyOnWriteArrayMap.java:169)
at 
org.apache.hadoop.hbase.client.MetaCache.getCachedLocation(MetaCache.java:79)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getCachedLocation(ConnectionManager.java:1391)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1231)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1183)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:211)
... 30 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638127#comment-15638127
 ] 

Enis Soztutar commented on HBASE-17016:
---

bq. If we can bring them back, and they are cheap - sure, why not.
Fair enough. 

bq. If we find out that any latency histograms are relatively expensive (in 
some visible form) I'd be in favor or removing them, unless someone has the 
usecase when they are actually useful.
I think the findings at HBASE-17017 justifies the removal, other than object 
allocation, there is 17% perf boost with basic testing. We can only bring them 
back if we do the same test with a new patch and there is no impact for the 
same test (both object allocation, and perf impact).  

> Reimplement per-region latency histogram metrics
> 
>
> Key: HBASE-17016
> URL: https://issues.apache.org/jira/browse/HBASE-17016
> Project: HBase
>  Issue Type: Task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
>
> Follow up from HBASE-10656, where [~enis] says:
> {quote}
> the main problem is that we have A LOT of per-region metrics that are latency 
> histograms. These latency histograms create many many Counter / LongAdder 
> objects. We should get rid of per-region latencies and maybe look at reducing 
> the per-region metric overhead.
> {quote}
> And [~ghelmling] gives us a good candidate to implement pre-region latency 
> histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram].
> Let's consider removing the per-region latency histograms and reimplement 
> using HdrHistogram.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638116#comment-15638116
 ] 

Enis Soztutar commented on HBASE-17017:
---

Thousands of counters are not that bad compared to millions at least. However 
agreed that we can think about purging these all together. We now have 
per-table metrics which should be the way to expose information, rather than 
per-region. 

In our deployments, we always disable per-region metrics because customers end 
up with tens of thousands of regions in total, and there is no way to look at 
per-region metrics without proper tooling. If you have more than 100 regions, 
the information is not that useful unless again there is some good tooling 
which most of the users would lack. FB was using per-region metrics, so we can 
see whether they are fine with that. 

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics

2016-11-04 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638102#comment-15638102
 ] 

Mikhail Antonov commented on HBASE-17016:
-

[~enis] not necessary close as won't fix; I meant to say that I think the unit 
of request rate outliers is often a single hot region; the unit of latency 
outlier is mostly (almost always?) RS - GC stall, WAL append failed due to 
dfsclient hitting error, that kind of thing, that makes latency per region not 
super useful imo. If we remove them and see any improvement in terms of "less 
latency outliers since less Counters etc" - great, less remove them. If we can 
bring them back, and they are cheap - sure, why not. If we find out that any 
latency histograms are relatively expensive (in some visible form) I'd be in 
favor or removing them, unless someone has the usecase when they are actually 
useful.

> Reimplement per-region latency histogram metrics
> 
>
> Key: HBASE-17016
> URL: https://issues.apache.org/jira/browse/HBASE-17016
> Project: HBase
>  Issue Type: Task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
>
> Follow up from HBASE-10656, where [~enis] says:
> {quote}
> the main problem is that we have A LOT of per-region metrics that are latency 
> histograms. These latency histograms create many many Counter / LongAdder 
> objects. We should get rid of per-region latencies and maybe look at reducing 
> the per-region metric overhead.
> {quote}
> And [~ghelmling] gives us a good candidate to implement pre-region latency 
> histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram].
> Let's consider removing the per-region latency histograms and reimplement 
> using HdrHistogram.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638098#comment-15638098
 ] 

Hudson commented on HBASE-17004:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1811 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1811/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev 71a2e1f225879d68e69fcedcd4ddfa281eae6030)
* (edit) hbase-it/pom.xml
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16960) RegionServer hang when aborting

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638097#comment-15638097
 ] 

Hudson commented on HBASE-16960:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1811 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1811/])
HBASE-16960 RegionServer hang when aborting (liyu: rev 
f42f6fa2443f0aee76962e22d5233a124a18d49a)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSyncFuture.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java


> RegionServer hang when aborting
> ---
>
> Key: HBASE-16960
> URL: https://issues.apache.org/jira/browse/HBASE-16960
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.3, 1.1.7
>Reporter: binlijin
>Assignee: binlijin
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: 16960.ut.missing.final.piece.txt, 
> HBASE-16960.branch-1.1.v1.patch, HBASE-16960.branch-1.2.v1.patch, 
> HBASE-16960.branch-1.v1.patch, HBASE-16960.patch, 
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, 
> HBASE-16960_master_v4.patch, RingBufferEventHandler.png, 
> RingBufferEventHandler_exception.png, SyncFuture.png, 
> SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on 
> this regionserver out of service and then all affected applications stop 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading

2016-11-04 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638081#comment-15638081
 ] 

Devaraj Das commented on HBASE-14417:
-

A summary of some internal discussions on the high-level flow that doesn't use 
ZK...
1. Client updates the hbase:backup table with a set of paths that are to be 
bulkloaded (if the tables in question have been fully backed up at least once 
in the past)
2. Client performs the bulkload of the data. If the client fails before the 
bulkload was fully complete, the cleaner chore in (5) would take care of 
cleaning up the unneeded entries from hbase:backup
3. There is a HFileCleaner that makes sure that paths that came about due to 
(1) are held until the next incremental backup
4. As part of the incremental backup, the hbase:backup table is updated to 
reflect the right location where the earlier bulkloaded file got copied to
5. A chore runs periodically (in the BackupController) that eliminates entries 
from the hbase:backup table if the corresponding paths don't exist in the 
filesystem until after a configured time period (default, say 24 hours; 
bulkload timeout is assumed to be much smaller than this, and hence all 
bulkloads that are meant to successfully complete would complete).
Thoughts?

> Incremental backup and bulk loading
> ---
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Ted Yu
>Priority: Critical
>  Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, 
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, 
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading 
> bypasses WALs for obvious reasons, breaking incremental backups. The only way 
> to continue backups after bulk loading is to create new full backup of a 
> table. This may not be feasible for customers who do bulk loading regularly 
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-17030:

Status: Patch Available  (was: Open)

> Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
> --
>
> Key: HBASE-17030
> URL: https://issues.apache.org/jira/browse/HBASE-17030
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-17030-v0.patch
>
>
> Make a couple of tweaks to HBASE-14551 split procedure
>  - remove tableName from SplitTableRegionProcedure ctor since we have the 
> RegionInfo that contains the name already
>  - move the checkRow in the constructor of the SplitTableRegionProcedure, 
> since the splitRow will never change and we can avoid to start the proc if we 
> have a bad splitRow.
>  - use the base AbstractStateMachineTableProcedure for the "user" field
>  - remove protobuf fields that can be extrapolated from other info 
> (table_name, split_row)
>  - avoid htd lookup every family iteration of splitStoreFiles()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-17030:

Attachment: HBASE-17030-v0.patch

> Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
> --
>
> Key: HBASE-17030
> URL: https://issues.apache.org/jira/browse/HBASE-17030
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-17030-v0.patch
>
>
> Make a couple of tweaks to HBASE-14551 split procedure
>  - remove tableName from SplitTableRegionProcedure ctor since we have the 
> RegionInfo that contains the name already
>  - move the checkRow in the constructor of the SplitTableRegionProcedure, 
> since the splitRow will never change and we can avoid to start the proc if we 
> have a bad splitRow.
>  - use the base AbstractStateMachineTableProcedure for the "user" field
>  - remove protobuf fields that can be extrapolated from other info 
> (table_name, split_row)
>  - avoid htd lookup every family iteration of splitStoreFiles()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638039#comment-15638039
 ] 

Andrew Purtell edited comment on HBASE-17017 at 11/4/16 11:32 PM:
--

So still get and scan counters per region? Can these go too? And the other per 
region counters? Can still amount to thousands of counters given thousands of 
regions. 


was (Author: apurtell):
So still get and scan counters per region? Can these go too? 

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-17029) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi resolved HBASE-17029.
-
Resolution: Duplicate

double click created two HBASE-17029/HBASE-17030. closing this one

> Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
> --
>
> Key: HBASE-17029
> URL: https://issues.apache.org/jira/browse/HBASE-17029
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Make a couple of tweaks to HBASE-14551 split procedure
>  - remove tableName from SplitTableRegionProcedure ctor since we have the 
> RegionInfo that contains the name already
>  - move the checkRow in the constructor of the SplitTableRegionProcedure, 
> since the splitRow will never change and we can avoid to start the proc if we 
> have a bad splitRow.
>  - use the base AbstractStateMachineTableProcedure for the "user" field
>  - remove protobuf fields that can be extrapolated from other info 
> (table_name, split_row)
>  - avoid htd lookup every family iteration of splitStoreFiles()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638039#comment-15638039
 ] 

Andrew Purtell commented on HBASE-17017:


So still get and scan counters per region? Can these go too? 

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638042#comment-15638042
 ] 

Hudson commented on HBASE-17004:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1916/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev 9564849ba181391d9716acb0172d241675ff25f2)
* (edit) hbase-it/pom.xml
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16892) Use TableName instead of String in SnapshotDescription

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638044#comment-15638044
 ] 

Hudson commented on HBASE-16892:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1916/])
HBASE-16892 Use TableName instead of String in SnapshotDescription 
(matteo.bertozzi: rev 00ea7aeafe6f0070dedf86a296eefd5d3c453077)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestSnapshotFromClient.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
* (edit) 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestSnapshotFromAdmin.java
* (edit) hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/SnapshotDescription.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotInfo.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestFlushSnapshotFromClient.java
* (edit) 
hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/CreateSnapshot.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestSnapshotFromMaster.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/SnapshotTestingUtils.java


> Use TableName instead of String in SnapshotDescription
> --
>
> Key: HBASE-16892
> URL: https://issues.apache.org/jira/browse/HBASE-16892
> Project: HBase
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, 
> HBASE-16892-v2.patch
>
>
> mostly find & replace work:
> deprecate the SnapshotDescription constructors with the String argument in 
> favor of the TableName ones. 
> Replace the TableName.valueOf() around with the new getTableName()
> Replace the TableName.getNameAsString() by just passing the TableName



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16865) Procedure v2 - Inherit lock from root proc

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638043#comment-15638043
 ] 

Hudson commented on HBASE-16865:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1916/])
HBASE-16865 Procedure v2 - Inherit lock from root proc (matteo.bertozzi: rev 
efe0a0eeadac14c2804a3d1590761502e5f247ee)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureScheduler.java
* (edit) 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/ProcedureTestingUtility.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java


> Procedure v2 - Inherit lock from root proc
> --
>
> Key: HBASE-16865
> URL: https://issues.apache.org/jira/browse/HBASE-16865
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-16865-v0.patch
>
>
> At the moment we support inheriting locks from the parent procedure for a 2 
> level procedures, but in case of reopen table regions we have a 3 level 
> procedures (ModifyTable -> ReOpen -> [Unassign/Assign])  and reopen does not 
> have any locks on its own.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638041#comment-15638041
 ] 

Hudson commented on HBASE-16937:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/1916/])
HBASE-16937 Replace SnapshotType protobuf conversion when we can 
(matteo.bertozzi: rev 7e05d0f161baef581d06f0dd978cd2e9b28e)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestFlushSnapshotFromClient.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/SnapshotTestingUtils.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/CreateSnapshot.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestRestoreFlushSnapshotFromClient.java


> Replace SnapshotType protobuf conversion when we can directly use the pojo 
> object
> -
>
> Key: HBASE-16937
> URL: https://issues.apache.org/jira/browse/HBASE-16937
> Project: HBase
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch
>
>
> mostly find & replace work:
> replace the back and forth protobuf conversion when we can just use the 
> client SnapshotType enum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure

2016-11-04 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-17030:
---

 Summary: Procedure v2 - A couple of tweaks to the 
SplitTableRegionProcedure
 Key: HBASE-17030
 URL: https://issues.apache.org/jira/browse/HBASE-17030
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Fix For: 2.0.0


Make a couple of tweaks to HBASE-14551 split procedure
 - remove tableName from SplitTableRegionProcedure ctor since we have the 
RegionInfo that contains the name already
 - move the checkRow in the constructor of the SplitTableRegionProcedure, since 
the splitRow will never change and we can avoid to start the proc if we have a 
bad splitRow.
 - use the base AbstractStateMachineTableProcedure for the "user" field
 - remove protobuf fields that can be extrapolated from other info (table_name, 
split_row)
 - avoid htd lookup every family iteration of splitStoreFiles()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17029) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure

2016-11-04 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-17029:
---

 Summary: Procedure v2 - A couple of tweaks to the 
SplitTableRegionProcedure
 Key: HBASE-17029
 URL: https://issues.apache.org/jira/browse/HBASE-17029
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Fix For: 2.0.0


Make a couple of tweaks to HBASE-14551 split procedure
 - remove tableName from SplitTableRegionProcedure ctor since we have the 
RegionInfo that contains the name already
 - move the checkRow in the constructor of the SplitTableRegionProcedure, since 
the splitRow will never change and we can avoid to start the proc if we have a 
bad splitRow.
 - use the base AbstractStateMachineTableProcedure for the "user" field
 - remove protobuf fields that can be extrapolated from other info (table_name, 
split_row)
 - avoid htd lookup every family iteration of splitStoreFiles()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16960) RegionServer hang when aborting

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638033#comment-15638033
 ] 

Hudson commented on HBASE-16960:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK8 #1895 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1895/])
HBASE-16960 RegionServer hang when aborting (liyu: rev 
f42f6fa2443f0aee76962e22d5233a124a18d49a)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSyncFuture.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java


> RegionServer hang when aborting
> ---
>
> Key: HBASE-16960
> URL: https://issues.apache.org/jira/browse/HBASE-16960
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.3, 1.1.7
>Reporter: binlijin
>Assignee: binlijin
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: 16960.ut.missing.final.piece.txt, 
> HBASE-16960.branch-1.1.v1.patch, HBASE-16960.branch-1.2.v1.patch, 
> HBASE-16960.branch-1.v1.patch, HBASE-16960.patch, 
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, 
> HBASE-16960_master_v4.patch, RingBufferEventHandler.png, 
> RingBufferEventHandler_exception.png, SyncFuture.png, 
> SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on 
> this regionserver out of service and then all affected applications stop 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638034#comment-15638034
 ] 

Hudson commented on HBASE-17004:


FAILURE: Integrated in Jenkins build HBase-1.1-JDK8 #1895 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1895/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev 71a2e1f225879d68e69fcedcd4ddfa281eae6030)
* (edit) hbase-it/pom.xml
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17026) VerifyReplication log should distinguish whether good row key is result of revalidation

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638031#comment-15638031
 ] 

Hadoop QA commented on HBASE-17026:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 41s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
17s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
0s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
21s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
45m 19s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 140m 4s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
38s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 208m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes |
|   | 
org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager
 |
|   | org.apache.hadoop.hbase.replication.TestMasterReplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837265/17026.v1.txt |
| JIRA Issue | HBASE-17026 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux be4c61546d76 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 9564849 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4337/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  

[jira] [Commented] (HBASE-16838) Implement basic scan

2016-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637995#comment-15637995
 ] 

Duo Zhang commented on HBASE-16838:
---

There is another reason for smallScan is the limit. Maybe we could add it to 
scan? Otherwise the RS can not know the scan is exhausted. We need to also 
modify the logic of RS to support it? And the small flag is deprecated then?

And in general, the scan method introduced here is only for experts, we do not 
want every user to call it directly. But for a small scan it just returns a 
CompletableFuture so it is much easier to use.

Thanks.

> Implement basic scan
> 
>
> Key: HBASE-16838
> URL: https://issues.apache.org/jira/browse/HBASE-16838
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, 
> HBASE-16838.patch
>
>
> Implement a scan works like the grpc streaming call that all returned results 
> will be passed to a ScanObserver. The methods of the observer will be called 
> directly in the rpc framework threads so it is not allowed to do time 
> consuming work in the methods. So in general only experts or the 
> implementation of other methods in AsyncTable can call this method directly, 
> that's why I call it 'basic scan'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637975#comment-15637975
 ] 

Enis Soztutar commented on HBASE-17016:
---

Attached a patch to subtask. Mikhail, are you saying that we should close this 
as won't fix after the subtask? I think it should be fine. 

> Reimplement per-region latency histogram metrics
> 
>
> Key: HBASE-17016
> URL: https://issues.apache.org/jira/browse/HBASE-17016
> Project: HBase
>  Issue Type: Task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
>
> Follow up from HBASE-10656, where [~enis] says:
> {quote}
> the main problem is that we have A LOT of per-region metrics that are latency 
> histograms. These latency histograms create many many Counter / LongAdder 
> objects. We should get rid of per-region latencies and maybe look at reducing 
> the per-region metric overhead.
> {quote}
> And [~ghelmling] gives us a good candidate to implement pre-region latency 
> histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram].
> Let's consider removing the per-region latency histograms and reimplement 
> using HdrHistogram.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17017:
--
Hadoop Flags: Incompatible change
Release Note: Removes per-region level (get size, get time, scan size and 
scan time histogram) metrics that was exposed before. Per-region histogram 
metrics with 1000+ regions causes millions of objects to be allocated on heap. 
The patch introduces getCount and scanCount as counters rather than histograms. 
Other per-region level metrics are kept as they are. 

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17017:
--
Attachment: hbase-17017_v1.patch

Attaching v1 patch. 

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17017:
--
Status: Patch Available  (was: Open)

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16996) Implement storage/retrieval of filesystem-use quotas into quota table

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637965#comment-15637965
 ] 

Hadoop QA commented on HBASE-16996:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 17s 
{color} | {color:red} hbase-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 26s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 17s 
{color} | {color:red} hbase-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 26s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 17s {color} 
| {color:red} hbase-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 26s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 3s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 2m 11s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 9s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 6s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.4. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 7s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.5. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 4s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.7.1. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 3s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.7.2. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 0s 
{color} | {color:red} The patch causes 16 errors with Hadoop v2.7.3. {color} |
| {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 56s 
{color} | {color:red} The patch causes 16 errors with Hadoop v3.0.0-alpha1. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 11s 
{color} | {color:red} hbase-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s 
{color} | {color:red} hbase-server in the patch failed. {color} |

[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637930#comment-15637930
 ] 

Enis Soztutar commented on HBASE-17017:
---

Runtimes: 
With patch: 
{code}
2016-11-04 15:19:44,017 INFO  [TestClient-20] hbase.PerformanceEvaluation: 
Finished TestClient-20 in 117708ms over 10 rows
{code}

w/o patch:
{code}
2016-11-04 14:53:33,082 INFO  [TestClient-20] hbase.PerformanceEvaluation: 
Finished TestClient-20 in 140958ms over 10 rows
{code}

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-17017:
--
Attachment: Screen Shot 2016-11-04 at 3.38.42 PM.png
Screen Shot 2016-11-04 at 3.00.21 PM.png

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
> Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot 
> 2016-11-04 at 3.38.42 PM.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637925#comment-15637925
 ] 

Enis Soztutar commented on HBASE-17017:
---

I've run PE with 1000 regions in a single server:
{code}
bin/hbase pe --latency --nomapred --presplit=1000 --valueSize=1000 
--rows=10 sequentialWrite 30
{code}

We are allocating ~1M LongAdder (former Counter) objects which is crazy. With a 
simple patch, the allocations goes down to less than 0.5% of heap so that JFR 
does not show it. The runtime for PE improves 17% because we are not spending 
time on this code path any more:
{code}
private LongAdder[] createCounters(int numBins) {
  return Stream.generate(LongAdder::new).limit(numBins + 
3).toArray(LongAdder[]::new);
}
{code}

See attached screenshots. 

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17007) Move ZooKeeper logging to its own log file

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637912#comment-15637912
 ] 

stack commented on HBASE-17007:
---

NP. Just thought occassional pain correlating between the two logs rather than 
one on the rare case of a zk issue small price to pay for some general clean 
up. Lets see if any other opinions.

On sessionids, they will still be in the logs reported by our 
RecoverableZooKeeper.

We can't remove the duplication. ZK spews at INFO level and logs properties and 
CLASSPATH duplicating our doing emissions of the same. Can't turn it off. 

Then there are also the occasional complaints from client like below (here it 
is timing around shutdown):

2016-11-03 12:39:49,832 INFO  [M:0;172.21.1.131:61739] 
zookeeper.MiniZooKeeperCluster: Shutdown MiniZK cluster with all ZK servers
2016-11-03 12:39:49,982 INFO  
[172.21.1.131:61739.activeMasterManager-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
2016-11-03 12:39:49,983 WARN  
[172.21.1.131:61739.activeMasterManager-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session 0x1582ba35a740006 for server null, unexpected 
error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

They are harmless but to a noobie operator, they probably look worrisome.

Thanks.

> Move ZooKeeper logging to its own log file
> --
>
> Key: HBASE-17007
> URL: https://issues.apache.org/jira/browse/HBASE-17007
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Trivial
> Attachments: 
> 0001-HBASE-17007-Move-ZooKeeper-logging-to-its-own-log-fi.patch
>
>
> ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a 
> different log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17007) Move ZooKeeper logging to its own log file

2016-11-04 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637883#comment-15637883
 ] 

Esteban Gutierrez commented on HBASE-17007:
---

We thought about removing only the classpath initially but that requires to 
patch ZooKeeper to change the client logging level for ZK.  Also ZooKeeper is 
used by some coprocessors like Tephra and Phoenix and logs get polluted quite 
easily due other tasks done by those CPs. There is another alternative and 
that's removing the duplicated classpath from the logs by adding CLASSPATH  to 
the list of skipwords in ServerCommandLine but usually the CLASSPATH 
environment string is shorter than java.class.path as reported by the jvm which 
is what ZK si dumping. In a quick test the whole line with java.class.path is 
63076 bytes long vs 14293 bytes long for the string that contains the CLASSPATH.

> Move ZooKeeper logging to its own log file
> --
>
> Key: HBASE-17007
> URL: https://issues.apache.org/jira/browse/HBASE-17007
> Project: HBase
>  Issue Type: Bug
>  Components: Zookeeper
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Trivial
> Attachments: 
> 0001-HBASE-17007-Move-ZooKeeper-logging-to-its-own-log-fi.patch
>
>
> ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a 
> different log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637859#comment-15637859
 ] 

Hudson commented on HBASE-17004:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #63 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/63/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev 804ce850030f607acf855876223d5fa7b3825d0a)
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java
* (edit) hbase-it/pom.xml


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-17017) Remove the current per-region latency histogram metrics

2016-11-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar reassigned HBASE-17017:
-

Assignee: Enis Soztutar

> Remove the current per-region latency histogram metrics
> ---
>
> Key: HBASE-17017
> URL: https://issues.apache.org/jira/browse/HBASE-17017
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Duo Zhang
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16996) Implement storage/retrieval of filesystem-use quotas into quota table

2016-11-04 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-16996:
---
Status: Patch Available  (was: Open)

> Implement storage/retrieval of filesystem-use quotas into quota table
> -
>
> Key: HBASE-16996
> URL: https://issues.apache.org/jira/browse/HBASE-16996
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-16996.001.patch
>
>
> Provide read/write API for accessing the new filesystem-usage quotas in the 
> existing {{hbase:quota}} table.
> Make sure that both the client can read quotas the quotas in the table as 
> well as the Master can perform the necessary update/delete actions per the 
> quota RPCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637835#comment-15637835
 ] 

Hudson commented on HBASE-17004:


FAILURE: Integrated in Jenkins build HBase-1.4 #518 (See 
[https://builds.apache.org/job/HBase-1.4/518/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev 9bc9f9b597a2cd5441cec08978a986eec5e58d8e)
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java
* (edit) hbase-it/pom.xml


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17014) Add clearly marked starting and shutdown log messages for all services.

2016-11-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637822#comment-15637822
 ] 

Enis Soztutar commented on HBASE-17014:
---

Seems slightly easier to spot. Right now ours are like this:
{code}
2016-11-04 13:48:19,500 FATAL [10.22.7.15:53432.activeMasterManager] 
master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout 
needs to be upgraded. You have version null and I want version 8. Consult 
http://hbase.apache.org/book.html for further informa
  at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:691)
  at 
org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:226)
  at 
org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:134)
  at 
org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:108)
  at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:683)
  at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:193)
  at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1762)
  at java.lang.Thread.run(Thread.java:745)
2016-11-04 13:48:19,501 INFO  [10.22.7.15:53432.activeMasterManager] 
regionserver.HRegionServer: * STOPPING region server 
'10.22.7.15,53432,1478292498386' *
2016-11-04 13:48:19,501 INFO  [10.22.7.15:53432.activeMasterManager] 
regionserver.HRegionServer: STOPPED: Stopped by 
10.22.7.15:53432.activeMasterManager
2016-11-04 13:48:19,614 INFO  [main] mortbay.log: Started 
SelectChannelConnector@0.0.0.0:53436
{code}

> Add clearly marked starting and shutdown log messages for all services.
> ---
>
> Key: HBASE-17014
> URL: https://issues.apache.org/jira/browse/HBASE-17014
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17014.v1.patch
>
>
> From observing the log messages, clearly marked starting and shutdown 
> messages for services HMaster, HRegionServer, ThriftServer and RESTServer 
> will improve log readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16838) Implement basic scan

2016-11-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637803#comment-15637803
 ] 

Enis Soztutar commented on HBASE-16838:
---

Sorry a bit late, but we were discussion with Devaraj about the small scan API 
yesterday. I understand the reason why we want to avoid 3 RPCs per scan if the 
scan is really small, but I think we should have made it so that ALL scans are 
saving RPCs, and becomes a "small" scan automatically without the client using 
a different API or Scan.setSmall(). 

There is no reason for regular scans to have openScan() and next() calls 
separately. We can easily make it so that Scanner open will return the next set 
of batch results. And we can make it so that the region server at the end of 
the scan when the region is exhausted automatically close the scanner before 
returning and give the results back to the client. So, for a "small" scan, the 
first RPC will open the results, and fetch all the results in the batch and 
return by closing the scanner in a single RPC automatically. What do you guys 
think? We can open a separate issue to track this. 

> Implement basic scan
> 
>
> Key: HBASE-16838
> URL: https://issues.apache.org/jira/browse/HBASE-16838
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, 
> HBASE-16838.patch
>
>
> Implement a scan works like the grpc streaming call that all returned results 
> will be passed to a ScanObserver. The methods of the observer will be called 
> directly in the rpc framework threads so it is not allowed to do time 
> consuming work in the methods. So in general only experts or the 
> implementation of other methods in AsyncTable can call this method directly, 
> that's why I call it 'basic scan'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17014) Add clearly marked starting and shutdown log messages for all services.

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637789#comment-15637789
 ] 

stack commented on HBASE-17014:
---

Here is namenode:

{code}
800172 2016-11-04 14:03:31,796 INFO 
org.apache.hadoop.hdfs.server.namenode.top.window.RollingWindowManager: topN 
size for command setReplication is: 1
800173 2016-11-04 14:04:02,150 ERROR 
org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
800174 2016-11-04 14:04:02,154 INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
800175 /
800176 SHUTDOWN_MSG: Shutting down NameNode at 
ve0524.halxg.cloudera.com/10.17.240.20
800177 /
800178 2016-11-04 14:04:44,798 INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
800179 /
800180 STARTUP_MSG: Starting NameNode
800181 STARTUP_MSG:   host = ve0524.halxg.cloudera.com/10.17.240.20
800182 STARTUP_MSG:   args = []
800183 STARTUP_MSG:   version = 2.7.3-SNAPSHOT
{code}

Three lines to report startup. Same for shutdown. Its formatted as a java 
comment for good measure.

> Add clearly marked starting and shutdown log messages for all services.
> ---
>
> Key: HBASE-17014
> URL: https://issues.apache.org/jira/browse/HBASE-17014
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17014.v1.patch
>
>
> From observing the log messages, clearly marked starting and shutdown 
> messages for services HMaster, HRegionServer, ThriftServer and RESTServer 
> will improve log readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics

2016-11-04 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637793#comment-15637793
 ] 

Mikhail Antonov commented on HBASE-17016:
-

I think in practice latency outliers are way more often used per server level 
than per region level (unlike request rate)? Would be fine to remove/replace I 
think.

> Reimplement per-region latency histogram metrics
> 
>
> Key: HBASE-17016
> URL: https://issues.apache.org/jira/browse/HBASE-17016
> Project: HBase
>  Issue Type: Task
>  Components: metrics
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Andrew Purtell
> Fix For: 2.0.0, 1.4.0
>
>
> Follow up from HBASE-10656, where [~enis] says:
> {quote}
> the main problem is that we have A LOT of per-region metrics that are latency 
> histograms. These latency histograms create many many Counter / LongAdder 
> objects. We should get rid of per-region latencies and maybe look at reducing 
> the per-region metric overhead.
> {quote}
> And [~ghelmling] gives us a good candidate to implement pre-region latency 
> histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram].
> Let's consider removing the per-region latency histograms and reimplement 
> using HdrHistogram.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16960) RegionServer hang when aborting

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637784#comment-15637784
 ] 

Hudson commented on HBASE-16960:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #56 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/56/])
HBASE-16960 RegionServer hang when aborting (liyu: rev 
906257838c05156f6678d0b11535f90f56e3c95d)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSyncFuture.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java


> RegionServer hang when aborting
> ---
>
> Key: HBASE-16960
> URL: https://issues.apache.org/jira/browse/HBASE-16960
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.3, 1.1.7
>Reporter: binlijin
>Assignee: binlijin
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: 16960.ut.missing.final.piece.txt, 
> HBASE-16960.branch-1.1.v1.patch, HBASE-16960.branch-1.2.v1.patch, 
> HBASE-16960.branch-1.v1.patch, HBASE-16960.patch, 
> HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, 
> HBASE-16960_master_v4.patch, RingBufferEventHandler.png, 
> RingBufferEventHandler_exception.png, SyncFuture.png, 
> SyncFuture_exception.png, rs1081.jstack
>
>
> We see regionserver hang when aborting several times and cause all regions on 
> this regionserver out of service and then all affected applications stop 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out

2016-11-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637733#comment-15637733
 ] 

Hudson commented on HBASE-17004:


FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #61 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/61/])
HBASE-17004  IntegrationTestManyRegions verifies that many regions get (appy: 
rev b1c17f0ef98c1c6674004f044b3160b1be37ca64)
* (edit) hbase-it/pom.xml
* (edit) 
hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java


> Refactor IntegrationTestManyRegions to use @ClassRule for timing out
> 
>
> Key: HBASE-17004
> URL: https://issues.apache.org/jira/browse/HBASE-17004
> Project: HBase
>  Issue Type: Improvement
>Reporter: Appy
>Assignee: Appy
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8
>
> Attachments: HBASE-17004.master.001.patch, 
> HBASE-17004.master.002.patch
>
>
> IntegrationTestManyRegions verifies that many regions get assigned within 
> given time. To do so, it spawns a new thread and uses CountDownLatch.await() 
> to timeout. Replacing this mechanism with junit @ClassRule to timeout the 
> test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

2016-11-04 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637715#comment-15637715
 ] 

Mikhail Antonov commented on HBASE-17018:
-

At a high level idea of having BufferedMutator or similar client API manage 
separate persistent storage with atomicity / replay guarantees sounds somewhat 
weird to me. Is that the problem to be solved outside of HBase? Or should it be 
bulk ingest or some sort as mentioned above?

> Spooling BufferedMutator
> 
>
> Key: HBASE-17018
> URL: https://issues.apache.org/jira/browse/HBASE-17018
> Project: HBase
>  Issue Type: New Feature
>Reporter: Joep Rottinghuis
> Attachments: YARN-4061 HBase requirements for fault tolerant 
> writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16977) VerifyReplication should log a printable representation of the row keys

2016-11-04 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri updated HBASE-16977:
--
Fix Version/s: 2.0.0

> VerifyReplication should log a printable representation of the row keys
> ---
>
> Key: HBASE-16977
> URL: https://issues.apache.org/jira/browse/HBASE-16977
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16977.V1.patch
>
>
> VerifyReplication prints out the row keys for offending rows in the task logs 
> for the MR job. However, the log is useless if the row key contains non 
> printable characters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637663#comment-15637663
 ] 

Hadoop QA commented on HBASE-15560:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 5s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} 
| {color:red} HBASE-15560 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837285/branch-1.tinylfu.txt |
| JIRA Issue | HBASE-15560 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4339/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16993) BucketCache throw java.io.IOException: Invalid HFile block magic when DATA_BLOCK_ENCODING set to DIFF

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637662#comment-15637662
 ] 

stack commented on HBASE-16993:
---

Why is this not a bug @liubangchen  ? If folks use non-standard bucket.sizes, 
do they run into your issue above? Thank you.

> BucketCache throw java.io.IOException: Invalid HFile block magic when 
> DATA_BLOCK_ENCODING set to DIFF
> -
>
> Key: HBASE-16993
> URL: https://issues.apache.org/jira/browse/HBASE-16993
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 1.1.3
> Environment: hbase version 1.1.3
>Reporter: liubangchen
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> hbase-site.xml setting
> 
> hbase.bucketcache.bucket.sizes
> 16384,32768,40960, 
> 46000,49152,51200,65536,131072,524288
> 
> 
> hbase.bucketcache.size
> 16384
> 
> 
> hbase.bucketcache.ioengine
> offheap
> 
> 
> hfile.block.cache.size
> 0.3
> 
> 
> hfile.block.bloom.cacheonwrite
> true
> 
> 
> hbase.rs.cacheblocksonwrite
> true
> 
> 
> hfile.block.index.cacheonwrite
> true
>  n_splits = 200
> create 'usertable',{NAME =>'family', COMPRESSION => 'snappy', VERSIONS => 
> 1,DATA_BLOCK_ENCODING => 'DIFF',CONFIGURATION => 
> {'hbase.hregion.memstore.block.multiplier' => 5}},{DURABILITY => 
> 'SKIP_WAL'},{SPLITS => (1..n_splits).map {|i| 
> "user#{1000+i*(-1000)/n_splits}"}}
> load data
> bin/ycsb load hbase10 -P workloads/workloada -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> recordcount=2 -p insertorder=hashed -p insertstart=0 -p 
> clientbuffering=true -p durability=SKIP_WAL -threads 20 -s 
> run 
> bin/ycsb run hbase10 -P workloads/workloadb -p table=usertable -p 
> columnfamily=family -p fieldcount=10 -p fieldlength=100 -p 
> operationcount=2000 -p readallfields=true -p clientbuffering=true -p 
> requestdistribution=zipfian  -threads 10 -s
> log info
> 2016-11-02 20:20:20,261 ERROR 
> [RW.default.readRpcServer.handler=36,queue=21,port=6020] bucket.BucketCache: 
> Failed reading block fdcc7ed6f3b2498b9ef316cc8206c233_44819759 from bucket 
> cache
> java.io.IOException: Invalid HFile block magic: 
> \x00\x00\x00\x00\x00\x00\x00\x00
> at 
> org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154)
> at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:167)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.(HFileBlock.java:273)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:134)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:121)
> at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:427)
> at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:266)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:403)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2071)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5369)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2546)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2532)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2514)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6558)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6537)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1935)
> at 
> 

[jira] [Commented] (HBASE-17023) Region left unassigned due to AM and SSH each thinking others would do the assignment work

2016-11-04 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637638#comment-15637638
 ] 

Matteo Bertozzi commented on HBASE-17023:
-

make sense to me, +1

> Region left unassigned due to AM and SSH each thinking others would do the 
> assignment work
> --
>
> Key: HBASE-17023
> URL: https://issues.apache.org/jira/browse/HBASE-17023
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.1.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-17023.v0-branch-1.1.patch
>
>
> Another Assignment Manager and SSH issue.  This issue is similar to 
> HBASE-13330, except this time the code path goes through ClosedRegionHandler 
> and we should apply the same fix of HBASE-13330 to ClosedRegionHandler.
> Basically, the AssignmentManager thinks the ServerShutdownHandler would 
> assign the region and the ServerShutdownHandler thinks that the 
> AssignmentManager would assign the region. The region 
> (23e0186c4d2b5cc09f25de35fe174417) ultimately never gets assigned. Below is 
> an analysis from the logs that captures the flow of events.
> 1. The AssignmentManager had initially assigned this region to 
> {{rs42.prod.foo.com,16020,1476293566365}}.
> 2. The {{rs42.prod.foo.com,16020,1476293566365}} stops and sends the CLOSE 
> request to master.
> 3. ServerShutdownHandler(SSH) runs to assign this region to 
> {{rs44.prod.foo.com,16020,1476294287692}}, but assign failed.
> 4. When the master restarted it did a scan of the meta to learn about the 
> regions in the cluster. It found this region still being assigned to
> {{rs42} from the meta record.
> 5. However, this {{rs42}} server was not alive anymore. So, the 
> AssignmentManager queued up a ServerShutdownHandling task for this (that 
> asynchronously executes):
> 6. In the meantime, the AssignmentManager proceeded to read the RIT nodes 
> from ZK. It found this region as well is in RS_ZK_REGION_FAILED_OPEN in the 
> {{rs44}} RS.
> 7. The region was moved to CLOSED state:
> {noformat}
> 2016-10-12 17:45:11,637 DEBUG [AM.ZK.Worker-pool2-t6] 
> master.AssignmentManager: Handling RS_ZK_REGION_FAILED_OPEN, 
> server=rs44.prod.foo.com,16020,1476294287692, 
> region=23e0186c4d2b5cc09f25de35fe174417, 
> current_state={23e0186c4d2b5cc09f25de35fe174417 state=PENDING_OPEN, 
> ts=1476294311564, server=rs44.prod.foo.com,16020,1476294287692}
> 2016-10-12 17:45:11,637 INFO  [AM.ZK.Worker-pool2-t6] master.RegionStates: 
> Transition {23e0186c4d2b5cc09f25de35fe174417 state=PENDING_OPEN, 
> ts=1476294311564, server=rs44.prod.foo.com,16020,1476294287692} to 
> {23e0186c4d2b5cc09f25de35fe174417 state=CLOSED, ts=1476294311637, 
> server=rs44.prod.foo.com,16020,1476294287692}
> 2016-10-12 17:45:11,637 WARN  [AM.ZK.Worker-pool2-t6] master.RegionStates: 
> 23e0186c4d2b5cc09f25de35fe174417 moved to CLOSED on 
> rs44.prod.foo.com,16020,1476294287692, expected 
> rs42.prod.foo.com,16020,1476293566365
> {noformat}
> 8. After that the AssignmentManager tried to assign it again. However, the 
> assignment didn't happen because the ServerShutdownHandling task queued 
> earlier didn't yet execute:
> {noformat}
> 2016-10-12 17:45:11,637 DEBUG [AM.ZK.Worker-pool2-t6] 
> master.AssignmentManager: Found an existing plan for 
> table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417. 
> destination server is rs44.prod.foo.com,16020,1476294287692 accepted as a 
> dest server = false
> 2016-10-12 17:45:11,697 DEBUG [AM.ZK.Worker-pool2-t6] 
> master.AssignmentManager: No previous transition plan found (or ignoring an 
> existing plan) for 
> table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417.; 
> generated random 
> plan=hri=table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417.,
>  src=, dest=rs28.prod.foo.com,16020,1476294291314; 10 (online=11) available 
> servers, forceNewPlan=true
> 2016-10-12 17:45:11,697 DEBUG [AM.ZK.Worker-pool2-t6] 
> handler.ClosedRegionHandler: Handling CLOSED event for 
> 23e0186c4d2b5cc09f25de35fe174417
> 2016-10-12 17:45:11,697 WARN  [AM.ZK.Worker-pool2-t6] master.RegionStates: 
> 23e0186c4d2b5cc09f25de35fe174417 moved to CLOSED on 
> rs44.prod.foo.com,16020,1476294287692, expected 
> rs42.prod.foo.com,16020,1476293566365
> 2016-10-12 17:45:11,697 INFO  [AM.ZK.Worker-pool2-t6] 
> master.AssignmentManager: Skip assigning 
> table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417., 
> it's host rs42.prod.foo.com,16020,1476293566365 is dead but not processed yet
> 2016-10-12 17:45:11,884 INFO  [MASTER_SERVER_OPERATIONS-server01:16000-3] 
> master.RegionStates: Transitioning {23e0186c4d2b5cc09f25de35fe174417 
> state=CLOSED, ts=1476294311697, 

[jira] [Updated] (HBASE-16892) Use TableName instead of String in SnapshotDescription

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16892:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Use TableName instead of String in SnapshotDescription
> --
>
> Key: HBASE-16892
> URL: https://issues.apache.org/jira/browse/HBASE-16892
> Project: HBase
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, 
> HBASE-16892-v2.patch
>
>
> mostly find & replace work:
> deprecate the SnapshotDescription constructors with the String argument in 
> favor of the TableName ones. 
> Replace the TableName.valueOf() around with the new getTableName()
> Replace the TableName.getNameAsString() by just passing the TableName



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16937:

Labels: snapshot  (was: )

> Replace SnapshotType protobuf conversion when we can directly use the pojo 
> object
> -
>
> Key: HBASE-16937
> URL: https://issues.apache.org/jira/browse/HBASE-16937
> Project: HBase
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch
>
>
> mostly find & replace work:
> replace the back and forth protobuf conversion when we can just use the 
> client SnapshotType enum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16937:

Labels:   (was: snapshot)

> Replace SnapshotType protobuf conversion when we can directly use the pojo 
> object
> -
>
> Key: HBASE-16937
> URL: https://issues.apache.org/jira/browse/HBASE-16937
> Project: HBase
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch
>
>
> mostly find & replace work:
> replace the back and forth protobuf conversion when we can just use the 
> client SnapshotType enum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16937:

Component/s: snapshots

> Replace SnapshotType protobuf conversion when we can directly use the pojo 
> object
> -
>
> Key: HBASE-16937
> URL: https://issues.apache.org/jira/browse/HBASE-16937
> Project: HBase
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch
>
>
> mostly find & replace work:
> replace the back and forth protobuf conversion when we can just use the 
> client SnapshotType enum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16865) Procedure v2 - Inherit lock from root proc

2016-11-04 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-16865:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Procedure v2 - Inherit lock from root proc
> --
>
> Key: HBASE-16865
> URL: https://issues.apache.org/jira/browse/HBASE-16865
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-16865-v0.patch
>
>
> At the moment we support inheriting locks from the parent procedure for a 2 
> level procedures, but in case of reopen table regions we have a 3 level 
> procedures (ModifyTable -> ReOpen -> [Unassign/Assign])  and reopen does not 
> have any locks on its own.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17028) New 2.0 blockcache (tinylfu) doesn't have inmemory partition, etc Update doc and codebase accordingly

2016-11-04 Thread stack (JIRA)
stack created HBASE-17028:
-

 Summary: New 2.0 blockcache (tinylfu) doesn't have inmemory 
partition, etc Update doc and codebase accordingly
 Key: HBASE-17028
 URL: https://issues.apache.org/jira/browse/HBASE-17028
 Project: HBase
  Issue Type: Sub-task
  Components: BlockCache
Reporter: stack


Intent is to make the parent tinylfu blockcache default on in 2.0 replacing our 
old lru blockcache. This issue is about making it clear in doc and code how the 
new blockcache differs from the old (You can put back the old lru blockcache 
with config change).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache

2016-11-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15560:
--
Attachment: branch-1.tinylfu.txt

My backport FYI. You can add LOG to this or just tell me what you'd like to 
see. Thanks [~ben.manes]

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16892) Use TableName instead of String in SnapshotDescription

2016-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637610#comment-15637610
 ] 

stack commented on HBASE-16892:
---

+1 Nice cleanup.

> Use TableName instead of String in SnapshotDescription
> --
>
> Key: HBASE-16892
> URL: https://issues.apache.org/jira/browse/HBASE-16892
> Project: HBase
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 2.0.0
>
> Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, 
> HBASE-16892-v2.patch
>
>
> mostly find & replace work:
> deprecate the SnapshotDescription constructors with the String argument in 
> favor of the TableName ones. 
> Replace the TableName.valueOf() around with the new getTableName()
> Replace the TableName.getNameAsString() by just passing the TableName



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16989) RowProcess#postBatchMutate doesn’t be executed before the mvcc transaction completion

2016-11-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637598#comment-15637598
 ] 

Hadoop QA commented on HBASE-16989:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
1s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
27m 23s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 25s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
13s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure |
|   | org.apache.hadoop.hbase.io.hfile.TestHFileBlockIndex |
|   | org.apache.hadoop.hbase.master.procedure.TestRestoreSnapshotProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12837205/HBASE-16989.v2.patch |
| JIRA Issue | HBASE-16989 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 702f6bc90750 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 05ee54f |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4334/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/4334/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/4334/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

  1   2   3   >