[jira] [Commented] (HBASE-17018) Spooling BufferedMutator
[ https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638727#comment-15638727 ] Joep Rottinghuis commented on HBASE-17018: -- Thanks for the comments. My thought around using MR were because of easy of implementation and stemmed from my use case where Yarn is present and therefore MR trivially available. It is a fair point that as a standalone feature in HBase this doesn't have to be true. Using MR isn't a requirement, but was merely a (naive) suggestion. I don't think that atomicity is a requirement, nor are we asking for "guarantees". If you want to be guaranteed to write something to HBase you probably shouldn't use a BufferedMutator in the first place. Please see attached PDF where I try to sketch out our use case and what behavior we're hoping to see. > Spooling BufferedMutator > > > Key: HBASE-17018 > URL: https://issues.apache.org/jira/browse/HBASE-17018 > Project: HBase > Issue Type: New Feature >Reporter: Joep Rottinghuis > Attachments: YARN-4061 HBase requirements for fault tolerant > writer.pdf > > > For Yarn Timeline Service v2 we use HBase as a backing store. > A big concern we would like to address is what to do if HBase is > (temporarily) down, for example in case of an HBase upgrade. > Most of the high volume writes will be mostly on a best-effort basis, but > occasionally we do a flush. Mainly during application lifecycle events, > clients will call a flush on the timeline service API. In order to handle the > volume of writes we use a BufferedMutator. When flush gets called on our API, > we in turn call flush on the BufferedMutator. > We would like our interface to HBase be able to spool the mutations to a > filesystems in case of HBase errors. If we use the Hadoop filesystem > interface, this can then be HDFS, gcs, s3, or any other distributed storage. > The mutations can then later be re-played, for example through a MapReduce > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638675#comment-15638675 ] stack commented on HBASE-16890: --- Smile. No worries. I'll try. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638647#comment-15638647 ] Duo Zhang commented on HBASE-16890: --- Honestly I do not know... I have never changed it before. You can try a 25 and a 75 to see if there are some difference. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638648#comment-15638648 ] Hadoop QA commented on HBASE-17033: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 38s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 155m 2s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 195m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestHRegion | | Timed out junit tests | org.apache.hadoop.hbase.TestGlobalMemStoreSize | | | org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes | | | org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelReplicationWithExpAsString | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837335/hbase-17033_v1.patch | | JIRA Issue | HBASE-17033 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 812ec33d27d5 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 7e05d0f | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/4345/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/4345/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/4345/testReport/ | | modules |
[jira] [Commented] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL
[ https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638616#comment-15638616 ] stack commented on HBASE-17021: --- I did a part-pass. Will do another later. [~ram_krish] You looking at this. Whats diff between this and your aproach boss? Thanks. > Use RingBuffer to reduce the contention in AsyncFSWAL > - > > Key: HBASE-17021 > URL: https://issues.apache.org/jira/browse/HBASE-17021 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17021.patch > > > The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can > get a better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638594#comment-15638594 ] stack commented on HBASE-16890: --- Takes an int. It defaults 50. You want it 100? [~Apache9] > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638548#comment-15638548 ] stack commented on HBASE-16890: --- Let me try. Will report back in morning. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638495#comment-15638495 ] Hadoop QA commented on HBASE-17032: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 46s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 16s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 58s {color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} branch-1.3 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} branch-1.3 passed with JDK v1.7.0_80 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s {color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s {color} | {color:green} branch-1.3 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} branch-1.3 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} branch-1.3 passed with JDK v1.7.0_80 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.7.0_80 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 16m 0s {color} | {color:green} The patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s {color} | {color:green} the patch passed with JDK v1.7.0_80 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 32s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 3s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 13s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.security.access.TestAccessController | | | org.apache.hadoop.hbase.tool.TestCanaryTool | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.1 Server=1.12.1 Image:yetus/hbase:463e832 | | JIRA Patch
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638480#comment-15638480 ] Duo Zhang commented on HBASE-16890: --- And for ioRatio, you need to cast the EventLoopGroup in AsyncFSWALProvider to NioEventLoopGroup and call its setIoRatio method. [~stack] It should be in (0, 100] which means the percentage of the time which the EventLoop will spend on doing io. Thanks. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL
[ https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638455#comment-15638455 ] Duo Zhang edited comment on HBASE-17021 at 11/5/16 3:13 AM: [~stack] If you can also make sure that this patch helps, then let's commit it first? Then I could work on the following part such as limit the concurrent sync requests. I do not want to put everything in a single big patch as we do not know if the newly added code works... Thanks. was (Author: apache9): [~stack] If you can also make sure that this patch helps, then let's commit it first? Then I could work on the following part such as limit the concurrent sync requests. I do not want to put everything in a single big patch... Thanks. > Use RingBuffer to reduce the contention in AsyncFSWAL > - > > Key: HBASE-17021 > URL: https://issues.apache.org/jira/browse/HBASE-17021 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17021.patch > > > The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can > get a better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL
[ https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638455#comment-15638455 ] Duo Zhang commented on HBASE-17021: --- [~stack] If you can also make sure that this patch helps, then let's commit it first? Then I could work on the following part such as limit the concurrent sync requests. I do not want to put everything in a single big patch... Thanks. > Use RingBuffer to reduce the contention in AsyncFSWAL > - > > Key: HBASE-17021 > URL: https://issues.apache.org/jira/browse/HBASE-17021 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17021.patch > > > The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can > get a better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638414#comment-15638414 ] Duo Zhang commented on HBASE-16890: --- The sync request of AsyncFSWAL is asynchronous so theoretically we could issue a sync for every append if the consumer task runs quickly enough... Anyway, let me try to limit the pending sync count to see if it helps for you as I can not observe the same result... Thanks. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638395#comment-15638395 ] stack commented on HBASE-16890: --- I'd think that asyncwal would aggregate more than the five threads FSHLog has running? I'd think the five threads would keep stamping on each other making smaller Packets than AsyncWAL is capable of and therefore would aggregate less than it. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17021) Use RingBuffer to reduce the contention in AsyncFSWAL
[ https://issues.apache.org/jira/browse/HBASE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638379#comment-15638379 ] Duo Zhang commented on HBASE-17021: --- At least one of the problems... > Use RingBuffer to reduce the contention in AsyncFSWAL > - > > Key: HBASE-17021 > URL: https://issues.apache.org/jira/browse/HBASE-17021 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17021.patch > > > The WALPE result in HBASE-16890 shows that with disruptor's RingBuffer we can > get a better performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638369#comment-15638369 ] Duo Zhang commented on HBASE-16890: --- If we have more sync request for AsyncFSWAL then no doubt FSHLog does better on aggregating and I think it is possble. We have five threads do syncing for FSHLog, so the most number of pending sync request will be five. If we reach the number then we are forced to do aggregating. But for AsyncFSWAL, there is no such limitation. Maybe we could also introduce a limit for AsyncFSWAL. Let me have a try. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638347#comment-15638347 ] Hudson commented on HBASE-17004: SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #69 (See [https://builds.apache.org/job/HBase-1.3-JDK8/69/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev b1c17f0ef98c1c6674004f044b3160b1be37ca64) * (edit) hbase-it/pom.xml * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16982) Better integrate Apache CLI in AbstractHBaseTool
[ https://issues.apache.org/jira/browse/HBASE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638360#comment-15638360 ] Hadoop QA commented on HBASE-16982: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 30m 44s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 40s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 55s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 21s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 27s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 54s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 27s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 33m 49s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s {color} | {color:green} hbase-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 116m 53s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 128m 42s {color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 358m 5s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.regionserver.wal.TestAsyncWALReplay | | | org.apache.hadoop.hbase.regionserver.wal.TestAsyncLogRolling | | | org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster | | | org.apache.hadoop.hbase.regionserver.wal.TestAsyncWALReplayCompressed | | | org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover | | | org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster | | |
[jira] [Commented] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638355#comment-15638355 ] Hadoop QA commented on HBASE-17030: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 19s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 7m 54s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 17s {color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 7m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 32m 14s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 7s {color} | {color:red} hbase-protocol-shaded in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 8s {color} | {color:red} hbase-protocol-shaded in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 22s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 34s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.master.procedure.TestModifyTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure | | | org.apache.hadoop.hbase.master.procedure.TestRestoreSnapshotProcedure | | | org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureWalLease | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837322/HBASE-17030-v0.patch | | JIRA Issue | HBASE-17030 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile cc
[jira] [Updated] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-17033: -- Attachment: hbase-17033_v1.patch Simple patch reduces these kind of allocations. > LogRoller makes a lot of allocations unnecessarily > -- > > Key: HBASE-17033 > URL: https://issues.apache.org/jira/browse/HBASE-17033 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Attachments: Screen Shot 2016-11-04 at 6.39.00 PM.png, > hbase-17033_v1.patch > > > I was looking at the other allocations for HBASE-17017. Seems that log roller > thread allocates 200MB for ~7% of the TLAB space. This is a lot of > allocations. > I think the reason is this: > {code} > while (true) { > if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) { > break; > } > if (syncFuture.isThrowable()) { > throw new > FailedSyncBeforeLogCloseException(syncFuture.getThrowable()); > } > } > {code} > This busy wait is causing a lot allocations because the thread is added to > the waiting list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-17033: -- Status: Patch Available (was: Open) > LogRoller makes a lot of allocations unnecessarily > -- > > Key: HBASE-17033 > URL: https://issues.apache.org/jira/browse/HBASE-17033 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Attachments: Screen Shot 2016-11-04 at 6.39.00 PM.png, > hbase-17033_v1.patch > > > I was looking at the other allocations for HBASE-17017. Seems that log roller > thread allocates 200MB for ~7% of the TLAB space. This is a lot of > allocations. > I think the reason is this: > {code} > while (true) { > if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) { > break; > } > if (syncFuture.isThrowable()) { > throw new > FailedSyncBeforeLogCloseException(syncFuture.getThrowable()); > } > } > {code} > This busy wait is causing a lot allocations because the thread is added to > the waiting list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638326#comment-15638326 ] Hadoop QA commented on HBASE-17017: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 25s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 31m 16s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s {color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 103m 8s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 153m 1s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestRegionServerMetrics | | | hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush | | Timed out junit tests | org.apache.hadoop.hbase.security.access.TestAccessController2 | | | org.apache.hadoop.hbase.TestMovedRegionsCleaner | | | org.apache.hadoop.hbase.security.access.TestCellACLWithMultipleVersions | | | org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization | | | org.apache.hadoop.hbase.security.access.TestAccessController | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837314/hbase-17017_v1.patch | | JIRA Issue | HBASE-17017 | | Optional Tests | asflicense javac
[jira] [Updated] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-17033: -- Attachment: Screen Shot 2016-11-04 at 6.39.00 PM.png Screenshot. > LogRoller makes a lot of allocations unnecessarily > -- > > Key: HBASE-17033 > URL: https://issues.apache.org/jira/browse/HBASE-17033 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Attachments: Screen Shot 2016-11-04 at 6.39.00 PM.png > > > I was looking at the other allocations for HBASE-17017. Seems that log roller > thread allocates 200MB for ~7% of the TLAB space. This is a lot of > allocations. > I think the reason is this: > {code} > while (true) { > if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) { > break; > } > if (syncFuture.isThrowable()) { > throw new > FailedSyncBeforeLogCloseException(syncFuture.getThrowable()); > } > } > {code} > This busy wait is causing a lot allocations because the thread is added to > the waiting list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638307#comment-15638307 ] Hudson commented on HBASE-17004: SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #57 (See [https://builds.apache.org/job/HBase-1.2-JDK8/57/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev 804ce850030f607acf855876223d5fa7b3825d0a) * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java * (edit) hbase-it/pom.xml > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17033) LogRoller makes a lot of allocations unnecessarily
Enis Soztutar created HBASE-17033: - Summary: LogRoller makes a lot of allocations unnecessarily Key: HBASE-17033 URL: https://issues.apache.org/jira/browse/HBASE-17033 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar I was looking at the other allocations for HBASE-17017. Seems that log roller thread allocates 200MB for ~7% of the TLAB space. This is a lot of allocations. I think the reason is this: {code} while (true) { if (this.safePointAttainedLatch.await(1, TimeUnit.NANOSECONDS)) { break; } if (syncFuture.isThrowable()) { throw new FailedSyncBeforeLogCloseException(syncFuture.getThrowable()); } } {code} This busy wait is causing a lot allocations because the thread is added to the waiting list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-17032: Status: Patch Available (was: Open) > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-17032.branch-1.3.v1.patch, > HBASE-17032.branch-1.3.v2.patch > > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-17032: Attachment: HBASE-17032.branch-1.3.v2.patch v2 patch with fixed test > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-17032.branch-1.3.v1.patch, > HBASE-17032.branch-1.3.v2.patch > > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638213#comment-15638213 ] stack commented on HBASE-16890: --- 48core. bq. Seems the problem is AsyncFSWAL can not use more CPUs even if there is no contention How is this bottlenecking us? The ringbuffer consumer is a single thread in both cases? Then in DFSClient it goes into a Q consumed by one thread. AsyncWAL should still be blowing FSHLog away Yeah, tell me about ioratio. I'm gone for an hour but will be back on. Can run anything you like. See above for stats on FSHLog doing better aggregating syncs. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638208#comment-15638208 ] stack commented on HBASE-16890: --- We should try and get metrics on packet sizes. FSHLog is making fatter packets? > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638205#comment-15638205 ] Duo Zhang commented on HBASE-16890: --- So what's the hardware of your machine [~stack] ? Seems the problem is AsyncFSWAL can not use more CPUs even if there is no contention? Maybe you could try increase/decrease ioRatio of netty to see if the result changes? Let me find the way of changing ioRatio for netty. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638206#comment-15638206 ] stack commented on HBASE-16890: --- {code} -- Histograms -- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos count = 2101170 min = 460988 max = 179884304 mean = 5793567.54 stddev = 19665482.79 median = 2129343.00 75% <= 2639978.00 95% <= 7591455.00 98% <= 106766212.00 99% <= 120363544.00 99.9% <= 179884304.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync count = 283081 min = 0 max = 16 mean = 6.46 stddev = 4.28 median = 8.00 75% <= 10.00 95% <= 12.00 98% <= 13.00 99% <= 14.00 99.9% <= 16.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs count = 283083 min = 747201 max = 179577999 mean = 5808497.64 stddev = 20550849.80 median = 1919769.00 75% <= 2594564.00 95% <= 6725774.00 98% <= 104668538.00 99% <= 126351306.00 99.9% <= 179577999.00 -- Meters -- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes count = 14690526510 mean rate = 69331875.36 events/second 1-minute rate = 40171021.67 events/second 5-minute rate = 73866875.49 events/second 15-minute rate = 83653584.79 events/second org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs count = 283104 mean rate = 1336.08 events/second 1-minute rate = 780.51 events/second 5-minute rate = 1324.92 events/second 15-minute rate = 1452.57 events/second {code} Looks like FSHLog is aggregating more syncs per actual sync 21 vs 6.5 > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638188#comment-15638188 ] Mikhail Antonov commented on HBASE-17032: - seems like it'd break TestFastFail. Will update the patch soon. > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-17032.branch-1.3.v1.patch > > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-16890: -- Attachment: Screen Shot 2016-11-04 at 5.30.18 PM.png Screen Shot 2016-11-04 at 5.21.27 PM.png The methods that consumer .5% or greater > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638184#comment-15638184 ] stack commented on HBASE-16890: --- I ran the tests a few times and results consistent. Looking in FSHLog run w/ JFR, I see more points of contention reported -- inside DFSClient. It uses maybe 25% more CPU probably because of the upped throughput. Otherwise, looking w/ JFR nothing jumps out. Let me put up pictures of the 'hot methods' It is almost as though FSHLog is doing more work (The top consumers are the WALPE random generation... we should fix that). The FSHLog must have a better 'flow' going on. Here is histograms for FSHLog: {code} -- Histograms -- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.latencyHistogram.nanos count = 8461245 min = 838241 max = 115799121 mean = 2696785.63 stddev = 6486391.73 median = 2199081.00 75% <= 2571547.00 95% <= 3237948.00 98% <= 3621166.00 99% <= 5216818.00 99.9% <= 115799121.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncCountHistogram.countPerSync count = 412764 min = 1 max = 86 mean = 21.04 stddev = 16.98 median = 17.00 75% <= 34.00 95% <= 53.00 98% <= 58.00 99% <= 62.00 99.9% <= 86.00 org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncHistogram.nanos-between-syncs count = 412764 min = 405379 max = 129879546 mean = 1680258.91 stddev = 7343616.88 median = 1127074.00 75% <= 1448611.00 95% <= 1812916.00 98% <= 1978098.00 99% <= 2150048.00 99.9% <= 122766311.00 -- Meters -- org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.appendMeter.bytes count = 59144801550 mean rate = 244727411.22 events/second 1-minute rate = 245882558.80 events/second 5-minute rate = 199668915.99 events/second 15-minute rate = 166822622.37 events/second org.apache.hadoop.hbase.wal.WALPerformanceEvaluation.syncMeter.syncs count = 412764 mean rate = 1707.90 events/second 1-minute rate = 1715.17 events/second 5-minute rate = 1342.77 events/second 15-minute rate = 1077.71 events/second {code} Let me get them for asyncwal... > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, async.svg, classic.svg, > contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-17032: Attachment: HBASE-17032.branch-1.3.v1.patch trivial patch > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-17032.branch-1.3.v1.patch > > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov reassigned HBASE-17032: --- Assignee: Mikhail Antonov > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.3.0 > > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-17032: Fix Version/s: 1.3.0 2.0.0 > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov > Fix For: 2.0.0, 1.3.0 > > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-17032: Affects Version/s: 1.3.0 > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.3.0 >Reporter: Mikhail Antonov > Fix For: 2.0.0, 1.3.0 > > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
[ https://issues.apache.org/jira/browse/HBASE-17032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-17032: Component/s: Client > CallQueueTooBigException and CallDroppedException should not be triggering > PFFE > --- > > Key: HBASE-17032 > URL: https://issues.apache.org/jira/browse/HBASE-17032 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Mikhail Antonov > > Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail > exception on the client. > It seems those 2 load control mechanists don't exactly align here. Server > throws CallQueueTooBigException, CallDroppedException (from deadline > scheduler) when it feels overloaded. Client should accept that behavior and > retry. When servers sheds the load, and client also bails out, the load > shedding bubbles up too high and high level impact on the client > applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17032) CallQueueTooBigException and CallDroppedException should not be triggering PFFE
Mikhail Antonov created HBASE-17032: --- Summary: CallQueueTooBigException and CallDroppedException should not be triggering PFFE Key: HBASE-17032 URL: https://issues.apache.org/jira/browse/HBASE-17032 Project: HBase Issue Type: Bug Reporter: Mikhail Antonov Back in HBASE-15137 we made it so that CQTBE causes preemptive fast fail exception on the client. It seems those 2 load control mechanists don't exactly align here. Server throws CallQueueTooBigException, CallDroppedException (from deadline scheduler) when it feels overloaded. Client should accept that behavior and retry. When servers sheds the load, and client also bails out, the load shedding bubbles up too high and high level impact on the client applications seems worse with PFFE turned on then without. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638149#comment-15638149 ] Gary Helmling commented on HBASE-17017: --- The Counter metrics are much less expensive (1 Counter instance vs 260 instances per histogram). And they're useful for identifying hot regions, so I think we should keep those around. In theory the size histograms could also be useful for that, but I can't say I've used them much. So dumping the time and size histograms seems okay to me. > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17022) TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17022: Resolution: Fixed Fix Version/s: 1.1.8 Status: Resolved (was: Patch Available) > TestMasterFailoverWithProcedures#testTruncateWithFailover fails constantly in > branch-1.1 > > > Key: HBASE-17022 > URL: https://issues.apache.org/jira/browse/HBASE-17022 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Yu Li >Assignee: Matteo Bertozzi > Fix For: 1.1.8 > > Attachments: HBASE-17022-v0.branch-1.1.patch, > HBASE-17022-v0_branch-1.1.patch > > > As titled, checking recent pre-commit UT of branch-1.1 we could find > {{TestMasterFailoverWithProcedures#testTruncateWithFailover}} keeps failing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638140#comment-15638140 ] stack commented on HBASE-16890: --- I ran WALPE w/ log roll disabled against a single, remote DN. I see that FSHLog is 2x AsyncWAL even w/ HBASE-17021 patch in place. FSHLog, Default Master Branch {code} 2016-11-04 16:09:05,210 INFO [main] wal.WALPerformanceEvaluation: Summary: threads=100, iterations=10, syncInterval=0 took 269.595s 37092.676ops/s Performance counter stats for './hbase/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 100 -iterations 10 -qualifiers 25 -keySize 50 -valueSize 200': 2970796.406680 task-clock (msec) # 10.831 CPUs utilized 19,589,972 context-switches #0.007 M/sec 2,862,328 cpu-migrations#0.963 K/sec 7,026,111 page-faults #0.002 M/sec 5,189,096,974,913 cycles#1.747 GHz stalled-cycles-frontend stalled-cycles-backend 2,899,414,852,894 instructions #0.56 insns per cycle 472,244,057,677 branches # 158.962 M/sec 4,717,852,912 branch-misses #1.00% of all branches 274.288161881 seconds time elapsed {code} Current State of AsyncFSWAL in master branch {code} 2016-11-04 16:19:01,247 INFO [main] wal.WALPerformanceEvaluation: Summary: threads=100, iterations=10, syncInterval=0 took 541.682s 18461.016ops/s Performance counter stats for './hbase/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 100 -iterations 10 -qualifiers 25 -keySize 50 -valueSize 200': 3032840.986653 task-clock (msec) #5.484 CPUs utilized 15,400,858 context-switches #0.005 M/sec 3,205,052 cpu-migrations#0.001 M/sec 12,901,416 page-faults #0.004 M/sec 5,212,559,898,743 cycles#1.719 GHz stalled-cycles-frontend stalled-cycles-backend 2,676,707,056,681 instructions #0.51 insns per cycle 445,557,848,140 branches # 146.911 M/sec 6,372,744,336 branch-misses #1.43% of all branches 553.074446643 seconds time elapsed {code} Patched AsyncWAL {code} 2016-11-04 16:36:12,872 INFO [main] wal.WALPerformanceEvaluation: Summary: threads=100, iterations=10, syncInterval=0 took 449.542s 22244.863ops/s Performance counter stats for './hbase/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.wal.WALPerformanceEvaluation -threads 100 -iterations 10 -qualifiers 25 -keySize 50 -valueSize 200': 2847554.990457 task-clock (msec) #6.151 CPUs utilized 11,158,364 context-switches #0.004 M/sec 1,697,560 cpu-migrations#0.596 K/sec 8,239,210 page-faults #0.003 M/sec 5,082,916,581,506 cycles#1.785 GHz stalled-cycles-frontend stalled-cycles-backend 2,443,254,158,990 instructions #0.48 insns per cycle 392,726,539,853 branches # 137.917 M/sec 5,782,766,858 branch-misses #1.47% of all branches 462.937995983 seconds time elapsed {code} Looking in flight recorder, I don't see any contention reported any more w/ the patched asyncwal so that is good. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, async.svg, classic.svg, > contention.png, contention_defaultWAL.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638131#comment-15638131 ] Andrew Purtell commented on HBASE-17017: /cc [~mantonov] [~ghelmling] [~eclark] > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638133#comment-15638133 ] Mikhail Antonov commented on HBASE-17016: - [~enis] yeah, +1 to that approach. > Reimplement per-region latency histogram metrics > > > Key: HBASE-17016 > URL: https://issues.apache.org/jira/browse/HBASE-17016 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Andrew Purtell > Fix For: 2.0.0, 1.4.0 > > > Follow up from HBASE-10656, where [~enis] says: > {quote} > the main problem is that we have A LOT of per-region metrics that are latency > histograms. These latency histograms create many many Counter / LongAdder > objects. We should get rid of per-region latencies and maybe look at reducing > the per-region metric overhead. > {quote} > And [~ghelmling] gives us a good candidate to implement pre-region latency > histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram]. > Let's consider removing the per-region latency histograms and reimplement > using HdrHistogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17031) Scanners should check for null start and end rows
Ashu Pachauri created HBASE-17031: - Summary: Scanners should check for null start and end rows Key: HBASE-17031 URL: https://issues.apache.org/jira/browse/HBASE-17031 Project: HBase Issue Type: Bug Components: Scanners Reporter: Ashu Pachauri Priority: Minor If a scan is passed with a null start row, it fails very deep in the call stack. We should validate start and end rows for not null before launching the scan. Here is the associated jstack: {code} java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166) at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:798) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1225) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:158) at org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:147) at org.apache.hadoop.hbase.types.CopyOnWriteArrayMap$ArrayHolder.find(CopyOnWriteArrayMap.java:892) at org.apache.hadoop.hbase.types.CopyOnWriteArrayMap.floorEntry(CopyOnWriteArrayMap.java:169) at org.apache.hadoop.hbase.client.MetaCache.getCachedLocation(MetaCache.java:79) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getCachedLocation(ConnectionManager.java:1391) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1231) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1183) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:211) ... 30 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638127#comment-15638127 ] Enis Soztutar commented on HBASE-17016: --- bq. If we can bring them back, and they are cheap - sure, why not. Fair enough. bq. If we find out that any latency histograms are relatively expensive (in some visible form) I'd be in favor or removing them, unless someone has the usecase when they are actually useful. I think the findings at HBASE-17017 justifies the removal, other than object allocation, there is 17% perf boost with basic testing. We can only bring them back if we do the same test with a new patch and there is no impact for the same test (both object allocation, and perf impact). > Reimplement per-region latency histogram metrics > > > Key: HBASE-17016 > URL: https://issues.apache.org/jira/browse/HBASE-17016 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Andrew Purtell > Fix For: 2.0.0, 1.4.0 > > > Follow up from HBASE-10656, where [~enis] says: > {quote} > the main problem is that we have A LOT of per-region metrics that are latency > histograms. These latency histograms create many many Counter / LongAdder > objects. We should get rid of per-region latencies and maybe look at reducing > the per-region metric overhead. > {quote} > And [~ghelmling] gives us a good candidate to implement pre-region latency > histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram]. > Let's consider removing the per-region latency histograms and reimplement > using HdrHistogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638116#comment-15638116 ] Enis Soztutar commented on HBASE-17017: --- Thousands of counters are not that bad compared to millions at least. However agreed that we can think about purging these all together. We now have per-table metrics which should be the way to expose information, rather than per-region. In our deployments, we always disable per-region metrics because customers end up with tens of thousands of regions in total, and there is no way to look at per-region metrics without proper tooling. If you have more than 100 regions, the information is not that useful unless again there is some good tooling which most of the users would lack. FB was using per-region metrics, so we can see whether they are fine with that. > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638102#comment-15638102 ] Mikhail Antonov commented on HBASE-17016: - [~enis] not necessary close as won't fix; I meant to say that I think the unit of request rate outliers is often a single hot region; the unit of latency outlier is mostly (almost always?) RS - GC stall, WAL append failed due to dfsclient hitting error, that kind of thing, that makes latency per region not super useful imo. If we remove them and see any improvement in terms of "less latency outliers since less Counters etc" - great, less remove them. If we can bring them back, and they are cheap - sure, why not. If we find out that any latency histograms are relatively expensive (in some visible form) I'd be in favor or removing them, unless someone has the usecase when they are actually useful. > Reimplement per-region latency histogram metrics > > > Key: HBASE-17016 > URL: https://issues.apache.org/jira/browse/HBASE-17016 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Andrew Purtell > Fix For: 2.0.0, 1.4.0 > > > Follow up from HBASE-10656, where [~enis] says: > {quote} > the main problem is that we have A LOT of per-region metrics that are latency > histograms. These latency histograms create many many Counter / LongAdder > objects. We should get rid of per-region latencies and maybe look at reducing > the per-region metric overhead. > {quote} > And [~ghelmling] gives us a good candidate to implement pre-region latency > histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram]. > Let's consider removing the per-region latency histograms and reimplement > using HdrHistogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638098#comment-15638098 ] Hudson commented on HBASE-17004: FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1811 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1811/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev 71a2e1f225879d68e69fcedcd4ddfa281eae6030) * (edit) hbase-it/pom.xml * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16960) RegionServer hang when aborting
[ https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638097#comment-15638097 ] Hudson commented on HBASE-16960: FAILURE: Integrated in Jenkins build HBase-1.1-JDK7 #1811 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1811/]) HBASE-16960 RegionServer hang when aborting (liyu: rev f42f6fa2443f0aee76962e22d5233a124a18d49a) * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSyncFuture.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java > RegionServer hang when aborting > --- > > Key: HBASE-16960 > URL: https://issues.apache.org/jira/browse/HBASE-16960 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.3, 1.1.7 >Reporter: binlijin >Assignee: binlijin >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: 16960.ut.missing.final.piece.txt, > HBASE-16960.branch-1.1.v1.patch, HBASE-16960.branch-1.2.v1.patch, > HBASE-16960.branch-1.v1.patch, HBASE-16960.patch, > HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, > HBASE-16960_master_v4.patch, RingBufferEventHandler.png, > RingBufferEventHandler_exception.png, SyncFuture.png, > SyncFuture_exception.png, rs1081.jstack > > > We see regionserver hang when aborting several times and cause all regions on > this regionserver out of service and then all affected applications stop > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14417) Incremental backup and bulk loading
[ https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638081#comment-15638081 ] Devaraj Das commented on HBASE-14417: - A summary of some internal discussions on the high-level flow that doesn't use ZK... 1. Client updates the hbase:backup table with a set of paths that are to be bulkloaded (if the tables in question have been fully backed up at least once in the past) 2. Client performs the bulkload of the data. If the client fails before the bulkload was fully complete, the cleaner chore in (5) would take care of cleaning up the unneeded entries from hbase:backup 3. There is a HFileCleaner that makes sure that paths that came about due to (1) are held until the next incremental backup 4. As part of the incremental backup, the hbase:backup table is updated to reflect the right location where the earlier bulkloaded file got copied to 5. A chore runs periodically (in the BackupController) that eliminates entries from the hbase:backup table if the corresponding paths don't exist in the filesystem until after a configured time period (default, say 24 hours; bulkload timeout is assumed to be much smaller than this, and hence all bulkloads that are meant to successfully complete would complete). Thoughts? > Incremental backup and bulk loading > --- > > Key: HBASE-14417 > URL: https://issues.apache.org/jira/browse/HBASE-14417 > Project: HBase > Issue Type: New Feature >Affects Versions: 2.0.0 >Reporter: Vladimir Rodionov >Assignee: Ted Yu >Priority: Critical > Labels: backup > Fix For: 2.0.0 > > Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt, > 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt, > 14417.v6.txt > > > Currently, incremental backup is based on WAL files. Bulk data loading > bypasses WALs for obvious reasons, breaking incremental backups. The only way > to continue backups after bulk loading is to create new full backup of a > table. This may not be feasible for customers who do bulk loading regularly > (say, every day). > Google doc for design: > https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17030: Status: Patch Available (was: Open) > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17030 > URL: https://issues.apache.org/jira/browse/HBASE-17030 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17030-v0.patch > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-17030: Attachment: HBASE-17030-v0.patch > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17030 > URL: https://issues.apache.org/jira/browse/HBASE-17030 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-17030-v0.patch > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638039#comment-15638039 ] Andrew Purtell edited comment on HBASE-17017 at 11/4/16 11:32 PM: -- So still get and scan counters per region? Can these go too? And the other per region counters? Can still amount to thousands of counters given thousands of regions. was (Author: apurtell): So still get and scan counters per region? Can these go too? > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-17029) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
[ https://issues.apache.org/jira/browse/HBASE-17029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi resolved HBASE-17029. - Resolution: Duplicate double click created two HBASE-17029/HBASE-17030. closing this one > Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure > -- > > Key: HBASE-17029 > URL: https://issues.apache.org/jira/browse/HBASE-17029 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > > Make a couple of tweaks to HBASE-14551 split procedure > - remove tableName from SplitTableRegionProcedure ctor since we have the > RegionInfo that contains the name already > - move the checkRow in the constructor of the SplitTableRegionProcedure, > since the splitRow will never change and we can avoid to start the proc if we > have a bad splitRow. > - use the base AbstractStateMachineTableProcedure for the "user" field > - remove protobuf fields that can be extrapolated from other info > (table_name, split_row) > - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638039#comment-15638039 ] Andrew Purtell commented on HBASE-17017: So still get and scan counters per region? Can these go too? > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638042#comment-15638042 ] Hudson commented on HBASE-17004: SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See [https://builds.apache.org/job/HBase-Trunk_matrix/1916/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev 9564849ba181391d9716acb0172d241675ff25f2) * (edit) hbase-it/pom.xml * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16892) Use TableName instead of String in SnapshotDescription
[ https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638044#comment-15638044 ] Hudson commented on HBASE-16892: SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See [https://builds.apache.org/job/HBase-Trunk_matrix/1916/]) HBASE-16892 Use TableName instead of String in SnapshotDescription (matteo.bertozzi: rev 00ea7aeafe6f0070dedf86a296eefd5d3c453077) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestSnapshotFromClient.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java * (edit) hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestSnapshotFromAdmin.java * (edit) hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/SnapshotDescription.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotInfo.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestFlushSnapshotFromClient.java * (edit) hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/CreateSnapshot.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestSnapshotFromMaster.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/SnapshotTestingUtils.java > Use TableName instead of String in SnapshotDescription > -- > > Key: HBASE-16892 > URL: https://issues.apache.org/jira/browse/HBASE-16892 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, > HBASE-16892-v2.patch > > > mostly find & replace work: > deprecate the SnapshotDescription constructors with the String argument in > favor of the TableName ones. > Replace the TableName.valueOf() around with the new getTableName() > Replace the TableName.getNameAsString() by just passing the TableName -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16865) Procedure v2 - Inherit lock from root proc
[ https://issues.apache.org/jira/browse/HBASE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638043#comment-15638043 ] Hudson commented on HBASE-16865: SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See [https://builds.apache.org/job/HBase-Trunk_matrix/1916/]) HBASE-16865 Procedure v2 - Inherit lock from root proc (matteo.bertozzi: rev efe0a0eeadac14c2804a3d1590761502e5f247ee) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureScheduler.java * (edit) hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/ProcedureTestingUtility.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java > Procedure v2 - Inherit lock from root proc > -- > > Key: HBASE-16865 > URL: https://issues.apache.org/jira/browse/HBASE-16865 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-16865-v0.patch > > > At the moment we support inheriting locks from the parent procedure for a 2 > level procedures, but in case of reopen table regions we have a 3 level > procedures (ModifyTable -> ReOpen -> [Unassign/Assign]) and reopen does not > have any locks on its own. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638041#comment-15638041 ] Hudson commented on HBASE-16937: SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #1916 (See [https://builds.apache.org/job/HBase-Trunk_matrix/1916/]) HBASE-16937 Replace SnapshotType protobuf conversion when we can (matteo.bertozzi: rev 7e05d0f161baef581d06f0dd978cd2e9b28e) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestFlushSnapshotFromClient.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/SnapshotTestingUtils.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/CreateSnapshot.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestRestoreFlushSnapshotFromClient.java > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17030) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
Matteo Bertozzi created HBASE-17030: --- Summary: Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure Key: HBASE-17030 URL: https://issues.apache.org/jira/browse/HBASE-17030 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0 Make a couple of tweaks to HBASE-14551 split procedure - remove tableName from SplitTableRegionProcedure ctor since we have the RegionInfo that contains the name already - move the checkRow in the constructor of the SplitTableRegionProcedure, since the splitRow will never change and we can avoid to start the proc if we have a bad splitRow. - use the base AbstractStateMachineTableProcedure for the "user" field - remove protobuf fields that can be extrapolated from other info (table_name, split_row) - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17029) Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure
Matteo Bertozzi created HBASE-17029: --- Summary: Procedure v2 - A couple of tweaks to the SplitTableRegionProcedure Key: HBASE-17029 URL: https://issues.apache.org/jira/browse/HBASE-17029 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0 Make a couple of tweaks to HBASE-14551 split procedure - remove tableName from SplitTableRegionProcedure ctor since we have the RegionInfo that contains the name already - move the checkRow in the constructor of the SplitTableRegionProcedure, since the splitRow will never change and we can avoid to start the proc if we have a bad splitRow. - use the base AbstractStateMachineTableProcedure for the "user" field - remove protobuf fields that can be extrapolated from other info (table_name, split_row) - avoid htd lookup every family iteration of splitStoreFiles() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16960) RegionServer hang when aborting
[ https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638033#comment-15638033 ] Hudson commented on HBASE-16960: FAILURE: Integrated in Jenkins build HBase-1.1-JDK8 #1895 (See [https://builds.apache.org/job/HBase-1.1-JDK8/1895/]) HBASE-16960 RegionServer hang when aborting (liyu: rev f42f6fa2443f0aee76962e22d5233a124a18d49a) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSyncFuture.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java > RegionServer hang when aborting > --- > > Key: HBASE-16960 > URL: https://issues.apache.org/jira/browse/HBASE-16960 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.3, 1.1.7 >Reporter: binlijin >Assignee: binlijin >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: 16960.ut.missing.final.piece.txt, > HBASE-16960.branch-1.1.v1.patch, HBASE-16960.branch-1.2.v1.patch, > HBASE-16960.branch-1.v1.patch, HBASE-16960.patch, > HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, > HBASE-16960_master_v4.patch, RingBufferEventHandler.png, > RingBufferEventHandler_exception.png, SyncFuture.png, > SyncFuture_exception.png, rs1081.jstack > > > We see regionserver hang when aborting several times and cause all regions on > this regionserver out of service and then all affected applications stop > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638034#comment-15638034 ] Hudson commented on HBASE-17004: FAILURE: Integrated in Jenkins build HBase-1.1-JDK8 #1895 (See [https://builds.apache.org/job/HBase-1.1-JDK8/1895/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev 71a2e1f225879d68e69fcedcd4ddfa281eae6030) * (edit) hbase-it/pom.xml * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17026) VerifyReplication log should distinguish whether good row key is result of revalidation
[ https://issues.apache.org/jira/browse/HBASE-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638031#comment-15638031 ] Hadoop QA commented on HBASE-17026: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 41s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 4s {color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for instructions. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 17s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 45m 19s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 140m 4s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 208m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes | | | org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager | | | org.apache.hadoop.hbase.replication.TestMasterReplication | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837265/17026.v1.txt | | JIRA Issue | HBASE-17026 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux be4c61546d76 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 9564849 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/4337/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs |
[jira] [Commented] (HBASE-16838) Implement basic scan
[ https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637995#comment-15637995 ] Duo Zhang commented on HBASE-16838: --- There is another reason for smallScan is the limit. Maybe we could add it to scan? Otherwise the RS can not know the scan is exhausted. We need to also modify the logic of RS to support it? And the small flag is deprecated then? And in general, the scan method introduced here is only for experts, we do not want every user to call it directly. But for a small scan it just returns a CompletableFuture so it is much easier to use. Thanks. > Implement basic scan > > > Key: HBASE-16838 > URL: https://issues.apache.org/jira/browse/HBASE-16838 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, > HBASE-16838.patch > > > Implement a scan works like the grpc streaming call that all returned results > will be passed to a ScanObserver. The methods of the observer will be called > directly in the rpc framework threads so it is not allowed to do time > consuming work in the methods. So in general only experts or the > implementation of other methods in AsyncTable can call this method directly, > that's why I call it 'basic scan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637975#comment-15637975 ] Enis Soztutar commented on HBASE-17016: --- Attached a patch to subtask. Mikhail, are you saying that we should close this as won't fix after the subtask? I think it should be fine. > Reimplement per-region latency histogram metrics > > > Key: HBASE-17016 > URL: https://issues.apache.org/jira/browse/HBASE-17016 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Andrew Purtell > Fix For: 2.0.0, 1.4.0 > > > Follow up from HBASE-10656, where [~enis] says: > {quote} > the main problem is that we have A LOT of per-region metrics that are latency > histograms. These latency histograms create many many Counter / LongAdder > objects. We should get rid of per-region latencies and maybe look at reducing > the per-region metric overhead. > {quote} > And [~ghelmling] gives us a good candidate to implement pre-region latency > histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram]. > Let's consider removing the per-region latency histograms and reimplement > using HdrHistogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-17017: -- Hadoop Flags: Incompatible change Release Note: Removes per-region level (get size, get time, scan size and scan time histogram) metrics that was exposed before. Per-region histogram metrics with 1000+ regions causes millions of objects to be allocated on heap. The patch introduces getCount and scanCount as counters rather than histograms. Other per-region level metrics are kept as they are. > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-17017: -- Attachment: hbase-17017_v1.patch Attaching v1 patch. > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-17017: -- Status: Patch Available (was: Open) > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png, hbase-17017_v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16996) Implement storage/retrieval of filesystem-use quotas into quota table
[ https://issues.apache.org/jira/browse/HBASE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637965#comment-15637965 ] Hadoop QA commented on HBASE-16996: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 17s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 26s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 17s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 26s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 17s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 26s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 1m 3s {color} | {color:red} The patch causes 16 errors with Hadoop v2.6.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 2m 11s {color} | {color:red} The patch causes 16 errors with Hadoop v2.6.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 3m 9s {color} | {color:red} The patch causes 16 errors with Hadoop v2.6.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 6s {color} | {color:red} The patch causes 16 errors with Hadoop v2.6.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 7s {color} | {color:red} The patch causes 16 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 4s {color} | {color:red} The patch causes 16 errors with Hadoop v2.7.1. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 3s {color} | {color:red} The patch causes 16 errors with Hadoop v2.7.2. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 0s {color} | {color:red} The patch causes 16 errors with Hadoop v2.7.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 56s {color} | {color:red} The patch causes 16 errors with Hadoop v3.0.0-alpha1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 11s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s {color} | {color:red} hbase-server in the patch failed. {color} |
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637930#comment-15637930 ] Enis Soztutar commented on HBASE-17017: --- Runtimes: With patch: {code} 2016-11-04 15:19:44,017 INFO [TestClient-20] hbase.PerformanceEvaluation: Finished TestClient-20 in 117708ms over 10 rows {code} w/o patch: {code} 2016-11-04 14:53:33,082 INFO [TestClient-20] hbase.PerformanceEvaluation: Finished TestClient-20 in 140958ms over 10 rows {code} > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-17017: -- Attachment: Screen Shot 2016-11-04 at 3.38.42 PM.png Screen Shot 2016-11-04 at 3.00.21 PM.png > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > Attachments: Screen Shot 2016-11-04 at 3.00.21 PM.png, Screen Shot > 2016-11-04 at 3.38.42 PM.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637925#comment-15637925 ] Enis Soztutar commented on HBASE-17017: --- I've run PE with 1000 regions in a single server: {code} bin/hbase pe --latency --nomapred --presplit=1000 --valueSize=1000 --rows=10 sequentialWrite 30 {code} We are allocating ~1M LongAdder (former Counter) objects which is crazy. With a simple patch, the allocations goes down to less than 0.5% of heap so that JFR does not show it. The runtime for PE improves 17% because we are not spending time on this code path any more: {code} private LongAdder[] createCounters(int numBins) { return Stream.generate(LongAdder::new).limit(numBins + 3).toArray(LongAdder[]::new); } {code} See attached screenshots. > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17007) Move ZooKeeper logging to its own log file
[ https://issues.apache.org/jira/browse/HBASE-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637912#comment-15637912 ] stack commented on HBASE-17007: --- NP. Just thought occassional pain correlating between the two logs rather than one on the rare case of a zk issue small price to pay for some general clean up. Lets see if any other opinions. On sessionids, they will still be in the logs reported by our RecoverableZooKeeper. We can't remove the duplication. ZK spews at INFO level and logs properties and CLASSPATH duplicating our doing emissions of the same. Can't turn it off. Then there are also the occasional complaints from client like below (here it is timing around shutdown): 2016-11-03 12:39:49,832 INFO [M:0;172.21.1.131:61739] zookeeper.MiniZooKeeperCluster: Shutdown MiniZK cluster with all ZK servers 2016-11-03 12:39:49,982 INFO [172.21.1.131:61739.activeMasterManager-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 2016-11-03 12:39:49,983 WARN [172.21.1.131:61739.activeMasterManager-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x1582ba35a740006 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) They are harmless but to a noobie operator, they probably look worrisome. Thanks. > Move ZooKeeper logging to its own log file > -- > > Key: HBASE-17007 > URL: https://issues.apache.org/jira/browse/HBASE-17007 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: Esteban Gutierrez >Assignee: Esteban Gutierrez >Priority: Trivial > Attachments: > 0001-HBASE-17007-Move-ZooKeeper-logging-to-its-own-log-fi.patch > > > ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a > different log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17007) Move ZooKeeper logging to its own log file
[ https://issues.apache.org/jira/browse/HBASE-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637883#comment-15637883 ] Esteban Gutierrez commented on HBASE-17007: --- We thought about removing only the classpath initially but that requires to patch ZooKeeper to change the client logging level for ZK. Also ZooKeeper is used by some coprocessors like Tephra and Phoenix and logs get polluted quite easily due other tasks done by those CPs. There is another alternative and that's removing the duplicated classpath from the logs by adding CLASSPATH to the list of skipwords in ServerCommandLine but usually the CLASSPATH environment string is shorter than java.class.path as reported by the jvm which is what ZK si dumping. In a quick test the whole line with java.class.path is 63076 bytes long vs 14293 bytes long for the string that contains the CLASSPATH. > Move ZooKeeper logging to its own log file > -- > > Key: HBASE-17007 > URL: https://issues.apache.org/jira/browse/HBASE-17007 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: Esteban Gutierrez >Assignee: Esteban Gutierrez >Priority: Trivial > Attachments: > 0001-HBASE-17007-Move-ZooKeeper-logging-to-its-own-log-fi.patch > > > ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a > different log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637859#comment-15637859 ] Hudson commented on HBASE-17004: SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #63 (See [https://builds.apache.org/job/HBase-1.2-JDK7/63/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev 804ce850030f607acf855876223d5fa7b3825d0a) * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java * (edit) hbase-it/pom.xml > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17017) Remove the current per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar reassigned HBASE-17017: - Assignee: Enis Soztutar > Remove the current per-region latency histogram metrics > --- > > Key: HBASE-17017 > URL: https://issues.apache.org/jira/browse/HBASE-17017 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16996) Implement storage/retrieval of filesystem-use quotas into quota table
[ https://issues.apache.org/jira/browse/HBASE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-16996: --- Status: Patch Available (was: Open) > Implement storage/retrieval of filesystem-use quotas into quota table > - > > Key: HBASE-16996 > URL: https://issues.apache.org/jira/browse/HBASE-16996 > Project: HBase > Issue Type: Sub-task >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 2.0.0 > > Attachments: HBASE-16996.001.patch > > > Provide read/write API for accessing the new filesystem-usage quotas in the > existing {{hbase:quota}} table. > Make sure that both the client can read quotas the quotas in the table as > well as the Master can perform the necessary update/delete actions per the > quota RPCs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637835#comment-15637835 ] Hudson commented on HBASE-17004: FAILURE: Integrated in Jenkins build HBase-1.4 #518 (See [https://builds.apache.org/job/HBase-1.4/518/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev 9bc9f9b597a2cd5441cec08978a986eec5e58d8e) * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java * (edit) hbase-it/pom.xml > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17014) Add clearly marked starting and shutdown log messages for all services.
[ https://issues.apache.org/jira/browse/HBASE-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637822#comment-15637822 ] Enis Soztutar commented on HBASE-17014: --- Seems slightly easier to spot. Right now ours are like this: {code} 2016-11-04 13:48:19,500 FATAL [10.22.7.15:53432.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout needs to be upgraded. You have version null and I want version 8. Consult http://hbase.apache.org/book.html for further informa at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:691) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:226) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:134) at org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:108) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:683) at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:193) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1762) at java.lang.Thread.run(Thread.java:745) 2016-11-04 13:48:19,501 INFO [10.22.7.15:53432.activeMasterManager] regionserver.HRegionServer: * STOPPING region server '10.22.7.15,53432,1478292498386' * 2016-11-04 13:48:19,501 INFO [10.22.7.15:53432.activeMasterManager] regionserver.HRegionServer: STOPPED: Stopped by 10.22.7.15:53432.activeMasterManager 2016-11-04 13:48:19,614 INFO [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:53436 {code} > Add clearly marked starting and shutdown log messages for all services. > --- > > Key: HBASE-17014 > URL: https://issues.apache.org/jira/browse/HBASE-17014 > Project: HBase > Issue Type: Improvement > Components: Operability >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17014.v1.patch > > > From observing the log messages, clearly marked starting and shutdown > messages for services HMaster, HRegionServer, ThriftServer and RESTServer > will improve log readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16838) Implement basic scan
[ https://issues.apache.org/jira/browse/HBASE-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637803#comment-15637803 ] Enis Soztutar commented on HBASE-16838: --- Sorry a bit late, but we were discussion with Devaraj about the small scan API yesterday. I understand the reason why we want to avoid 3 RPCs per scan if the scan is really small, but I think we should have made it so that ALL scans are saving RPCs, and becomes a "small" scan automatically without the client using a different API or Scan.setSmall(). There is no reason for regular scans to have openScan() and next() calls separately. We can easily make it so that Scanner open will return the next set of batch results. And we can make it so that the region server at the end of the scan when the region is exhausted automatically close the scanner before returning and give the results back to the client. So, for a "small" scan, the first RPC will open the results, and fetch all the results in the batch and return by closing the scanner in a single RPC automatically. What do you guys think? We can open a separate issue to track this. > Implement basic scan > > > Key: HBASE-16838 > URL: https://issues.apache.org/jira/browse/HBASE-16838 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-16838-v1.patch, HBASE-16838-v2.patch, > HBASE-16838.patch > > > Implement a scan works like the grpc streaming call that all returned results > will be passed to a ScanObserver. The methods of the observer will be called > directly in the rpc framework threads so it is not allowed to do time > consuming work in the methods. So in general only experts or the > implementation of other methods in AsyncTable can call this method directly, > that's why I call it 'basic scan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17014) Add clearly marked starting and shutdown log messages for all services.
[ https://issues.apache.org/jira/browse/HBASE-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637789#comment-15637789 ] stack commented on HBASE-17014: --- Here is namenode: {code} 800172 2016-11-04 14:03:31,796 INFO org.apache.hadoop.hdfs.server.namenode.top.window.RollingWindowManager: topN size for command setReplication is: 1 800173 2016-11-04 14:04:02,150 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM 800174 2016-11-04 14:04:02,154 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 800175 / 800176 SHUTDOWN_MSG: Shutting down NameNode at ve0524.halxg.cloudera.com/10.17.240.20 800177 / 800178 2016-11-04 14:04:44,798 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: 800179 / 800180 STARTUP_MSG: Starting NameNode 800181 STARTUP_MSG: host = ve0524.halxg.cloudera.com/10.17.240.20 800182 STARTUP_MSG: args = [] 800183 STARTUP_MSG: version = 2.7.3-SNAPSHOT {code} Three lines to report startup. Same for shutdown. Its formatted as a java comment for good measure. > Add clearly marked starting and shutdown log messages for all services. > --- > > Key: HBASE-17014 > URL: https://issues.apache.org/jira/browse/HBASE-17014 > Project: HBase > Issue Type: Improvement > Components: Operability >Reporter: Umesh Agashe >Assignee: Umesh Agashe >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17014.v1.patch > > > From observing the log messages, clearly marked starting and shutdown > messages for services HMaster, HRegionServer, ThriftServer and RESTServer > will improve log readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17016) Reimplement per-region latency histogram metrics
[ https://issues.apache.org/jira/browse/HBASE-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637793#comment-15637793 ] Mikhail Antonov commented on HBASE-17016: - I think in practice latency outliers are way more often used per server level than per region level (unlike request rate)? Would be fine to remove/replace I think. > Reimplement per-region latency histogram metrics > > > Key: HBASE-17016 > URL: https://issues.apache.org/jira/browse/HBASE-17016 > Project: HBase > Issue Type: Task > Components: metrics >Affects Versions: 2.0.0, 1.4.0 >Reporter: Andrew Purtell > Fix For: 2.0.0, 1.4.0 > > > Follow up from HBASE-10656, where [~enis] says: > {quote} > the main problem is that we have A LOT of per-region metrics that are latency > histograms. These latency histograms create many many Counter / LongAdder > objects. We should get rid of per-region latencies and maybe look at reducing > the per-region metric overhead. > {quote} > And [~ghelmling] gives us a good candidate to implement pre-region latency > histograms [HdrHistogram|https://github.com/HdrHistogram/HdrHistogram]. > Let's consider removing the per-region latency histograms and reimplement > using HdrHistogram. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16960) RegionServer hang when aborting
[ https://issues.apache.org/jira/browse/HBASE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637784#comment-15637784 ] Hudson commented on HBASE-16960: SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #56 (See [https://builds.apache.org/job/HBase-1.2-JDK8/56/]) HBASE-16960 RegionServer hang when aborting (liyu: rev 906257838c05156f6678d0b11535f90f56e3c95d) * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestSyncFuture.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALActionsListener.java > RegionServer hang when aborting > --- > > Key: HBASE-16960 > URL: https://issues.apache.org/jira/browse/HBASE-16960 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.3, 1.1.7 >Reporter: binlijin >Assignee: binlijin >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: 16960.ut.missing.final.piece.txt, > HBASE-16960.branch-1.1.v1.patch, HBASE-16960.branch-1.2.v1.patch, > HBASE-16960.branch-1.v1.patch, HBASE-16960.patch, > HBASE-16960_master_v2.patch, HBASE-16960_master_v3.patch, > HBASE-16960_master_v4.patch, RingBufferEventHandler.png, > RingBufferEventHandler_exception.png, SyncFuture.png, > SyncFuture_exception.png, rs1081.jstack > > > We see regionserver hang when aborting several times and cause all regions on > this regionserver out of service and then all affected applications stop > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17004) Refactor IntegrationTestManyRegions to use @ClassRule for timing out
[ https://issues.apache.org/jira/browse/HBASE-17004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637733#comment-15637733 ] Hudson commented on HBASE-17004: FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #61 (See [https://builds.apache.org/job/HBase-1.3-JDK7/61/]) HBASE-17004 IntegrationTestManyRegions verifies that many regions get (appy: rev b1c17f0ef98c1c6674004f044b3160b1be37ca64) * (edit) hbase-it/pom.xml * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestManyRegions.java > Refactor IntegrationTestManyRegions to use @ClassRule for timing out > > > Key: HBASE-17004 > URL: https://issues.apache.org/jira/browse/HBASE-17004 > Project: HBase > Issue Type: Improvement >Reporter: Appy >Assignee: Appy >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 1.1.8 > > Attachments: HBASE-17004.master.001.patch, > HBASE-17004.master.002.patch > > > IntegrationTestManyRegions verifies that many regions get assigned within > given time. To do so, it spawns a new thread and uses CountDownLatch.await() > to timeout. Replacing this mechanism with junit @ClassRule to timeout the > test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17018) Spooling BufferedMutator
[ https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637715#comment-15637715 ] Mikhail Antonov commented on HBASE-17018: - At a high level idea of having BufferedMutator or similar client API manage separate persistent storage with atomicity / replay guarantees sounds somewhat weird to me. Is that the problem to be solved outside of HBase? Or should it be bulk ingest or some sort as mentioned above? > Spooling BufferedMutator > > > Key: HBASE-17018 > URL: https://issues.apache.org/jira/browse/HBASE-17018 > Project: HBase > Issue Type: New Feature >Reporter: Joep Rottinghuis > Attachments: YARN-4061 HBase requirements for fault tolerant > writer.pdf > > > For Yarn Timeline Service v2 we use HBase as a backing store. > A big concern we would like to address is what to do if HBase is > (temporarily) down, for example in case of an HBase upgrade. > Most of the high volume writes will be mostly on a best-effort basis, but > occasionally we do a flush. Mainly during application lifecycle events, > clients will call a flush on the timeline service API. In order to handle the > volume of writes we use a BufferedMutator. When flush gets called on our API, > we in turn call flush on the BufferedMutator. > We would like our interface to HBase be able to spool the mutations to a > filesystems in case of HBase errors. If we use the Hadoop filesystem > interface, this can then be HDFS, gcs, s3, or any other distributed storage. > The mutations can then later be re-played, for example through a MapReduce > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16977) VerifyReplication should log a printable representation of the row keys
[ https://issues.apache.org/jira/browse/HBASE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashu Pachauri updated HBASE-16977: -- Fix Version/s: 2.0.0 > VerifyReplication should log a printable representation of the row keys > --- > > Key: HBASE-16977 > URL: https://issues.apache.org/jira/browse/HBASE-16977 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Ashu Pachauri >Assignee: Ashu Pachauri >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16977.V1.patch > > > VerifyReplication prints out the row keys for offending rows in the task logs > for the MR job. However, the log is useless if the row key contains non > printable characters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637663#comment-15637663 ] Hadoop QA commented on HBASE-15560: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 5s {color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for instructions. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} | {color:red} HBASE-15560 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837285/branch-1.tinylfu.txt | | JIRA Issue | HBASE-15560 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/4339/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16993) BucketCache throw java.io.IOException: Invalid HFile block magic when DATA_BLOCK_ENCODING set to DIFF
[ https://issues.apache.org/jira/browse/HBASE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637662#comment-15637662 ] stack commented on HBASE-16993: --- Why is this not a bug @liubangchen ? If folks use non-standard bucket.sizes, do they run into your issue above? Thank you. > BucketCache throw java.io.IOException: Invalid HFile block magic when > DATA_BLOCK_ENCODING set to DIFF > - > > Key: HBASE-16993 > URL: https://issues.apache.org/jira/browse/HBASE-16993 > Project: HBase > Issue Type: Bug > Components: io >Affects Versions: 1.1.3 > Environment: hbase version 1.1.3 >Reporter: liubangchen > Original Estimate: 336h > Remaining Estimate: 336h > > hbase-site.xml setting > > hbase.bucketcache.bucket.sizes > 16384,32768,40960, > 46000,49152,51200,65536,131072,524288 > > > hbase.bucketcache.size > 16384 > > > hbase.bucketcache.ioengine > offheap > > > hfile.block.cache.size > 0.3 > > > hfile.block.bloom.cacheonwrite > true > > > hbase.rs.cacheblocksonwrite > true > > > hfile.block.index.cacheonwrite > true > n_splits = 200 > create 'usertable',{NAME =>'family', COMPRESSION => 'snappy', VERSIONS => > 1,DATA_BLOCK_ENCODING => 'DIFF',CONFIGURATION => > {'hbase.hregion.memstore.block.multiplier' => 5}},{DURABILITY => > 'SKIP_WAL'},{SPLITS => (1..n_splits).map {|i| > "user#{1000+i*(-1000)/n_splits}"}} > load data > bin/ycsb load hbase10 -P workloads/workloada -p table=usertable -p > columnfamily=family -p fieldcount=10 -p fieldlength=100 -p > recordcount=2 -p insertorder=hashed -p insertstart=0 -p > clientbuffering=true -p durability=SKIP_WAL -threads 20 -s > run > bin/ycsb run hbase10 -P workloads/workloadb -p table=usertable -p > columnfamily=family -p fieldcount=10 -p fieldlength=100 -p > operationcount=2000 -p readallfields=true -p clientbuffering=true -p > requestdistribution=zipfian -threads 10 -s > log info > 2016-11-02 20:20:20,261 ERROR > [RW.default.readRpcServer.handler=36,queue=21,port=6020] bucket.BucketCache: > Failed reading block fdcc7ed6f3b2498b9ef316cc8206c233_44819759 from bucket > cache > java.io.IOException: Invalid HFile block magic: > \x00\x00\x00\x00\x00\x00\x00\x00 > at > org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:154) > at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:167) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.(HFileBlock.java:273) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:134) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$1.deserialize(HFileBlock.java:121) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:427) > at > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:85) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.getCachedBlock(HFileReaderV2.java:266) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:403) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:217) > at > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2071) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:5369) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2546) > at > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2532) > at > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2514) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6558) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6537) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1935) > at >
[jira] [Commented] (HBASE-17023) Region left unassigned due to AM and SSH each thinking others would do the assignment work
[ https://issues.apache.org/jira/browse/HBASE-17023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637638#comment-15637638 ] Matteo Bertozzi commented on HBASE-17023: - make sense to me, +1 > Region left unassigned due to AM and SSH each thinking others would do the > assignment work > -- > > Key: HBASE-17023 > URL: https://issues.apache.org/jira/browse/HBASE-17023 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 1.1.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Attachments: HBASE-17023.v0-branch-1.1.patch > > > Another Assignment Manager and SSH issue. This issue is similar to > HBASE-13330, except this time the code path goes through ClosedRegionHandler > and we should apply the same fix of HBASE-13330 to ClosedRegionHandler. > Basically, the AssignmentManager thinks the ServerShutdownHandler would > assign the region and the ServerShutdownHandler thinks that the > AssignmentManager would assign the region. The region > (23e0186c4d2b5cc09f25de35fe174417) ultimately never gets assigned. Below is > an analysis from the logs that captures the flow of events. > 1. The AssignmentManager had initially assigned this region to > {{rs42.prod.foo.com,16020,1476293566365}}. > 2. The {{rs42.prod.foo.com,16020,1476293566365}} stops and sends the CLOSE > request to master. > 3. ServerShutdownHandler(SSH) runs to assign this region to > {{rs44.prod.foo.com,16020,1476294287692}}, but assign failed. > 4. When the master restarted it did a scan of the meta to learn about the > regions in the cluster. It found this region still being assigned to > {{rs42} from the meta record. > 5. However, this {{rs42}} server was not alive anymore. So, the > AssignmentManager queued up a ServerShutdownHandling task for this (that > asynchronously executes): > 6. In the meantime, the AssignmentManager proceeded to read the RIT nodes > from ZK. It found this region as well is in RS_ZK_REGION_FAILED_OPEN in the > {{rs44}} RS. > 7. The region was moved to CLOSED state: > {noformat} > 2016-10-12 17:45:11,637 DEBUG [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: Handling RS_ZK_REGION_FAILED_OPEN, > server=rs44.prod.foo.com,16020,1476294287692, > region=23e0186c4d2b5cc09f25de35fe174417, > current_state={23e0186c4d2b5cc09f25de35fe174417 state=PENDING_OPEN, > ts=1476294311564, server=rs44.prod.foo.com,16020,1476294287692} > 2016-10-12 17:45:11,637 INFO [AM.ZK.Worker-pool2-t6] master.RegionStates: > Transition {23e0186c4d2b5cc09f25de35fe174417 state=PENDING_OPEN, > ts=1476294311564, server=rs44.prod.foo.com,16020,1476294287692} to > {23e0186c4d2b5cc09f25de35fe174417 state=CLOSED, ts=1476294311637, > server=rs44.prod.foo.com,16020,1476294287692} > 2016-10-12 17:45:11,637 WARN [AM.ZK.Worker-pool2-t6] master.RegionStates: > 23e0186c4d2b5cc09f25de35fe174417 moved to CLOSED on > rs44.prod.foo.com,16020,1476294287692, expected > rs42.prod.foo.com,16020,1476293566365 > {noformat} > 8. After that the AssignmentManager tried to assign it again. However, the > assignment didn't happen because the ServerShutdownHandling task queued > earlier didn't yet execute: > {noformat} > 2016-10-12 17:45:11,637 DEBUG [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: Found an existing plan for > table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417. > destination server is rs44.prod.foo.com,16020,1476294287692 accepted as a > dest server = false > 2016-10-12 17:45:11,697 DEBUG [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: No previous transition plan found (or ignoring an > existing plan) for > table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417.; > generated random > plan=hri=table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417., > src=, dest=rs28.prod.foo.com,16020,1476294291314; 10 (online=11) available > servers, forceNewPlan=true > 2016-10-12 17:45:11,697 DEBUG [AM.ZK.Worker-pool2-t6] > handler.ClosedRegionHandler: Handling CLOSED event for > 23e0186c4d2b5cc09f25de35fe174417 > 2016-10-12 17:45:11,697 WARN [AM.ZK.Worker-pool2-t6] master.RegionStates: > 23e0186c4d2b5cc09f25de35fe174417 moved to CLOSED on > rs44.prod.foo.com,16020,1476294287692, expected > rs42.prod.foo.com,16020,1476293566365 > 2016-10-12 17:45:11,697 INFO [AM.ZK.Worker-pool2-t6] > master.AssignmentManager: Skip assigning > table1,3025965238305402_2,1468091325259.23e0186c4d2b5cc09f25de35fe174417., > it's host rs42.prod.foo.com,16020,1476293566365 is dead but not processed yet > 2016-10-12 17:45:11,884 INFO [MASTER_SERVER_OPERATIONS-server01:16000-3] > master.RegionStates: Transitioning {23e0186c4d2b5cc09f25de35fe174417 > state=CLOSED, ts=1476294311697,
[jira] [Updated] (HBASE-16892) Use TableName instead of String in SnapshotDescription
[ https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16892: Resolution: Fixed Status: Resolved (was: Patch Available) > Use TableName instead of String in SnapshotDescription > -- > > Key: HBASE-16892 > URL: https://issues.apache.org/jira/browse/HBASE-16892 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, > HBASE-16892-v2.patch > > > mostly find & replace work: > deprecate the SnapshotDescription constructors with the String argument in > favor of the TableName ones. > Replace the TableName.valueOf() around with the new getTableName() > Replace the TableName.getNameAsString() by just passing the TableName -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Labels: snapshot (was: ) > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Labels: (was: snapshot) > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16937) Replace SnapshotType protobuf conversion when we can directly use the pojo object
[ https://issues.apache.org/jira/browse/HBASE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16937: Component/s: snapshots > Replace SnapshotType protobuf conversion when we can directly use the pojo > object > - > > Key: HBASE-16937 > URL: https://issues.apache.org/jira/browse/HBASE-16937 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16937-v0.patch, HBASE-16937-v1.patch > > > mostly find & replace work: > replace the back and forth protobuf conversion when we can just use the > client SnapshotType enum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16865) Procedure v2 - Inherit lock from root proc
[ https://issues.apache.org/jira/browse/HBASE-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-16865: Resolution: Fixed Status: Resolved (was: Patch Available) > Procedure v2 - Inherit lock from root proc > -- > > Key: HBASE-16865 > URL: https://issues.apache.org/jira/browse/HBASE-16865 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-16865-v0.patch > > > At the moment we support inheriting locks from the parent procedure for a 2 > level procedures, but in case of reopen table regions we have a 3 level > procedures (ModifyTable -> ReOpen -> [Unassign/Assign]) and reopen does not > have any locks on its own. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17028) New 2.0 blockcache (tinylfu) doesn't have inmemory partition, etc Update doc and codebase accordingly
stack created HBASE-17028: - Summary: New 2.0 blockcache (tinylfu) doesn't have inmemory partition, etc Update doc and codebase accordingly Key: HBASE-17028 URL: https://issues.apache.org/jira/browse/HBASE-17028 Project: HBase Issue Type: Sub-task Components: BlockCache Reporter: stack Intent is to make the parent tinylfu blockcache default on in 2.0 replacing our old lru blockcache. This issue is about making it clear in doc and code how the new blockcache differs from the old (You can put back the old lru blockcache with config change). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15560: -- Attachment: branch-1.tinylfu.txt My backport FYI. You can add LOG to this or just tell me what you'd like to see. Thanks [~ben.manes] > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16892) Use TableName instead of String in SnapshotDescription
[ https://issues.apache.org/jira/browse/HBASE-16892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637610#comment-15637610 ] stack commented on HBASE-16892: --- +1 Nice cleanup. > Use TableName instead of String in SnapshotDescription > -- > > Key: HBASE-16892 > URL: https://issues.apache.org/jira/browse/HBASE-16892 > Project: HBase > Issue Type: Sub-task > Components: snapshots >Affects Versions: 2.0.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0 > > Attachments: HBASE-16892-v0.patch, HBASE-16892-v1.patch, > HBASE-16892-v2.patch > > > mostly find & replace work: > deprecate the SnapshotDescription constructors with the String argument in > favor of the TableName ones. > Replace the TableName.valueOf() around with the new getTableName() > Replace the TableName.getNameAsString() by just passing the TableName -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16989) RowProcess#postBatchMutate doesn’t be executed before the mvcc transaction completion
[ https://issues.apache.org/jira/browse/HBASE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637598#comment-15637598 ] Hadoop QA commented on HBASE-16989: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 1s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 23s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 25s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 136m 48s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure | | | org.apache.hadoop.hbase.io.hfile.TestHFileBlockIndex | | | org.apache.hadoop.hbase.master.procedure.TestRestoreSnapshotProcedure | | | org.apache.hadoop.hbase.master.procedure.TestTruncateTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestMasterProcedureWalLease | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12837205/HBASE-16989.v2.patch | | JIRA Issue | HBASE-16989 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 702f6bc90750 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 05ee54f | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/4334/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/4334/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/4334/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output |