[jira] [Commented] (HBASE-22471) Our nightly jobs for master and branch-2 are still using hadoop-2.7.1 in integration test
[ https://issues.apache.org/jira/browse/HBASE-22471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848345#comment-16848345 ] Duo Zhang commented on HBASE-22471: --- [~busbey] Haven't find the way to clean the workspace for our nightly jobs... > Our nightly jobs for master and branch-2 are still using hadoop-2.7.1 in > integration test > - > > Key: HBASE-22471 > URL: https://issues.apache.org/jira/browse/HBASE-22471 > Project: HBase > Issue Type: Bug > Components: build >Reporter: Duo Zhang >Priority: Major > > We use ls to get the hadoop 2 jars, so maybe the problem is that the 2.7.1 > jars are already there for a long time. We need to clean the workspace. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22471) Our nightly jobs for master and branch-2 are still using hadoop-2.7.1 in integration test
Duo Zhang created HBASE-22471: - Summary: Our nightly jobs for master and branch-2 are still using hadoop-2.7.1 in integration test Key: HBASE-22471 URL: https://issues.apache.org/jira/browse/HBASE-22471 Project: HBase Issue Type: Bug Components: build Reporter: Duo Zhang We use ls to get the hadoop 2 jars, so maybe the problem is that the 2.7.1 jars are already there for a long time. We need to clean the workspace. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-22422: - Status: Patch Available (was: Open) > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-22422: - Attachment: HBASE-22422.HBASE-21879.v02.patch > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements
[ https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848320#comment-16848320 ] Hudson commented on HBASE-21991: Results for branch branch-2.1 [build #1191 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Fix MetaMetrics issues - [Race condition, Faulty remove logic], few > improvements > > > Key: HBASE-21991 > URL: https://issues.apache.org/jira/browse/HBASE-21991 > Project: HBase > Issue Type: Bug > Components: Coprocessors, metrics >Reporter: Sakthi >Assignee: Sakthi >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.5 > > Attachments: hbase-21991.addendum.patch, > hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, > hbase-21991.master.001.patch, hbase-21991.master.002.patch, > hbase-21991.master.003.patch, hbase-21991.master.004.patch, > hbase-21991.master.005.patch, hbase-21991.master.006.patch > > > Here is a list of the issues related to the MetaMetrics implementation: > +*Bugs*+: > # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: > Under certain conditions, we might end up storing/exposing all the meters > rather than top-k-ish > # MetaMetrics can throw NPE resulting in aborting of the RS because of a > *Race Condition*. > +*Improvements*+: > # With high number of regions in the cluster, exposure of metrics for each > region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of > regions. It's better to use *lossy counting to maintain top-k for region > metrics* as well. > # As the lossy meters do not represent actual counts, I think, it'll be > better to *rename the meters to include "lossy" in the name*. It would be > more informative while monitoring the metrics and there would be less > confusion regarding actual counts to lossy counts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements
[ https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848319#comment-16848319 ] Hudson commented on HBASE-21991: Results for branch branch-2.2 [build #287 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Fix MetaMetrics issues - [Race condition, Faulty remove logic], few > improvements > > > Key: HBASE-21991 > URL: https://issues.apache.org/jira/browse/HBASE-21991 > Project: HBase > Issue Type: Bug > Components: Coprocessors, metrics >Reporter: Sakthi >Assignee: Sakthi >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.5 > > Attachments: hbase-21991.addendum.patch, > hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, > hbase-21991.master.001.patch, hbase-21991.master.002.patch, > hbase-21991.master.003.patch, hbase-21991.master.004.patch, > hbase-21991.master.005.patch, hbase-21991.master.006.patch > > > Here is a list of the issues related to the MetaMetrics implementation: > +*Bugs*+: > # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: > Under certain conditions, we might end up storing/exposing all the meters > rather than top-k-ish > # MetaMetrics can throw NPE resulting in aborting of the RS because of a > *Race Condition*. > +*Improvements*+: > # With high number of regions in the cluster, exposure of metrics for each > region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of > regions. It's better to use *lossy counting to maintain top-k for region > metrics* as well. > # As the lossy meters do not represent actual counts, I think, it'll be > better to *rename the meters to include "lossy" in the name*. It would be > more informative while monitoring the metrics and there would be less > confusion regarding actual counts to lossy counts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor
[ https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848318#comment-16848318 ] Hudson commented on HBASE-21800: Results for branch branch-2.2 [build #287 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > RegionServer aborted due to NPE from MetaTableMetrics coprocessor > - > > Key: HBASE-21800 > URL: https://issues.apache.org/jira/browse/HBASE-21800 > Project: HBase > Issue Type: Bug > Components: Coprocessors, meta, metrics, Operability >Reporter: Sakthi >Assignee: Sakthi >Priority: Critical > Labels: Meta > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.4 > > Attachments: hbase-21800.branch-1.001.patch, > hbase-21800.branch-1.002.patch, hbase-21800.branch-1.003.patch, > hbase-21800.branch-1.004.patch, hbase-21800.master.001.patch, > hbase-21800.master.002.patch, hbase-21800.master.003.patch > > > I was just playing around the code, trying to capture "Top k" table metrics > from MetaMetrics, when I bumped into this issue. Though currently we are not > capturing "Top K" table metrics, but we can encounter this issue because of > the "Top k Clients" that is implemented using the LossyAlgo. > > RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The > log looks somewhat like this: > {code:java} > 2019-01-28 23:31:10,311 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > coprocessor.CoprocessorHost: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > 2019-01-28 23:31:10,314 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.HRegionServer: * ABORTING region server > 10.0.0.24,16020,1548747043814: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException * > java.lang.NullPointerException > at >
[jira] [Resolved] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements
[ https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-21991. --- Resolution: Fixed Hadoop Flags: Incompatible change Fix Version/s: 2.1.5 Pushed addendum to branch-2.1 and main patch to branch-2.2. FYI [~zghaobac], [~stack] > Fix MetaMetrics issues - [Race condition, Faulty remove logic], few > improvements > > > Key: HBASE-21991 > URL: https://issues.apache.org/jira/browse/HBASE-21991 > Project: HBase > Issue Type: Bug > Components: Coprocessors, metrics >Reporter: Sakthi >Assignee: Sakthi >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.5 > > Attachments: hbase-21991.addendum.patch, > hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, > hbase-21991.master.001.patch, hbase-21991.master.002.patch, > hbase-21991.master.003.patch, hbase-21991.master.004.patch, > hbase-21991.master.005.patch, hbase-21991.master.006.patch > > > Here is a list of the issues related to the MetaMetrics implementation: > +*Bugs*+: > # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: > Under certain conditions, we might end up storing/exposing all the meters > rather than top-k-ish > # MetaMetrics can throw NPE resulting in aborting of the RS because of a > *Race Condition*. > +*Improvements*+: > # With high number of regions in the cluster, exposure of metrics for each > region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of > regions. It's better to use *lossy counting to maintain top-k for region > metrics* as well. > # As the lossy meters do not represent actual counts, I think, it'll be > better to *rename the meters to include "lossy" in the name*. It would be > more informative while monitoring the metrics and there would be less > confusion regarding actual counts to lossy counts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor
[ https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-21800. --- Resolution: Fixed Fix Version/s: 2.1.4 Pushed to branch-2.2. FYI [~zghaobac] > RegionServer aborted due to NPE from MetaTableMetrics coprocessor > - > > Key: HBASE-21800 > URL: https://issues.apache.org/jira/browse/HBASE-21800 > Project: HBase > Issue Type: Bug > Components: Coprocessors, meta, metrics, Operability >Reporter: Sakthi >Assignee: Sakthi >Priority: Critical > Labels: Meta > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.4 > > Attachments: hbase-21800.branch-1.001.patch, > hbase-21800.branch-1.002.patch, hbase-21800.branch-1.003.patch, > hbase-21800.branch-1.004.patch, hbase-21800.master.001.patch, > hbase-21800.master.002.patch, hbase-21800.master.003.patch > > > I was just playing around the code, trying to capture "Top k" table metrics > from MetaMetrics, when I bumped into this issue. Though currently we are not > capturing "Top K" table metrics, but we can encounter this issue because of > the "Top k Clients" that is implemented using the LossyAlgo. > > RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The > log looks somewhat like this: > {code:java} > 2019-01-28 23:31:10,311 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > coprocessor.CoprocessorHost: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > 2019-01-28 23:31:10,314 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.HRegionServer: * ABORTING region server > 10.0.0.24,16020,1548747043814: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException * > java.lang.NullPointerException > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547) > at >
[jira] [Reopened] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor
[ https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi reopened HBASE-21800: --- Reopening for pushing missing commit to branch-2.2. > RegionServer aborted due to NPE from MetaTableMetrics coprocessor > - > > Key: HBASE-21800 > URL: https://issues.apache.org/jira/browse/HBASE-21800 > Project: HBase > Issue Type: Bug > Components: Coprocessors, meta, metrics, Operability >Reporter: Sakthi >Assignee: Sakthi >Priority: Critical > Labels: Meta > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0 > > Attachments: hbase-21800.branch-1.001.patch, > hbase-21800.branch-1.002.patch, hbase-21800.branch-1.003.patch, > hbase-21800.branch-1.004.patch, hbase-21800.master.001.patch, > hbase-21800.master.002.patch, hbase-21800.master.003.patch > > > I was just playing around the code, trying to capture "Top k" table metrics > from MetaMetrics, when I bumped into this issue. Though currently we are not > capturing "Top K" table metrics, but we can encounter this issue because of > the "Top k Clients" that is implemented using the LossyAlgo. > > RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The > log looks somewhat like this: > {code:java} > 2019-01-28 23:31:10,311 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > coprocessor.CoprocessorHost: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > 2019-01-28 23:31:10,314 ERROR > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.HRegionServer: * ABORTING region server > 10.0.0.24,16020,1548747043814: The coprocessor > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw > java.lang.NullPointerException * > java.lang.NullPointerException > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233) > at > org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547) > at >
[jira] [Reopened] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements
[ https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi reopened HBASE-21991: --- Reopening to push missing base patch to branch-2.2 and addendum to branch-2.1. > Fix MetaMetrics issues - [Race condition, Faulty remove logic], few > improvements > > > Key: HBASE-21991 > URL: https://issues.apache.org/jira/browse/HBASE-21991 > Project: HBase > Issue Type: Bug > Components: Coprocessors, metrics >Reporter: Sakthi >Assignee: Sakthi >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0 > > Attachments: hbase-21991.addendum.patch, > hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, > hbase-21991.master.001.patch, hbase-21991.master.002.patch, > hbase-21991.master.003.patch, hbase-21991.master.004.patch, > hbase-21991.master.005.patch, hbase-21991.master.006.patch > > > Here is a list of the issues related to the MetaMetrics implementation: > +*Bugs*+: > # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: > Under certain conditions, we might end up storing/exposing all the meters > rather than top-k-ish > # MetaMetrics can throw NPE resulting in aborting of the RS because of a > *Race Condition*. > +*Improvements*+: > # With high number of regions in the cluster, exposure of metrics for each > region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of > regions. It's better to use *lossy counting to maintain top-k for region > metrics* as well. > # As the lossy meters do not represent actual counts, I think, it'll be > better to *rename the meters to include "lossy" in the name*. It would be > more informative while monitoring the metrics and there would be less > confusion regarding actual counts to lossy counts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-11062) htop
[ https://issues.apache.org/jira/browse/HBASE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848258#comment-16848258 ] Toshihiro Suzuki commented on HBASE-11062: -- +1 for hbtop. [~elserj] {quote} You have a patch/PR for what you demo'ed (even with the LGPL dependency)? {quote} Not yet. I need to do some refactoring of the code and add some tests. Maybe, I can create a PR for it next week or in 2 weeks. > htop > > > Key: HBASE-11062 > URL: https://issues.apache.org/jira/browse/HBASE-11062 > Project: HBase > Issue Type: New Feature > Components: hbase-operator-tools >Reporter: Andrew Purtell >Assignee: Toshihiro Suzuki >Priority: Major > > A top-like monitor could be useful for testing, debugging, operations of > clusters of moderate size, and possibly for diagnosing issues in large > clusters. > Consider a curses interface like the one presented by atop > (http://www.atoptool.nl/images/screenshots/genericw.png) - with aggregate > metrics collected over a monitoring interval in the upper portion of the > pane, and a listing of discrete measurements sorted and filtered by various > criteria in the bottom part of the pane. One might imagine a cluster overview > with cluster aggregate metrics above and a list of regionservers sorted by > utilization below; and a regionserver view with process metrics above and a > list of metrics by operation type below, or a list of client connections, or > a list of threads, sorted by utilization, throughput, or latency. > Generically 'htop' is taken but would be distinctive in the HBase context, a > utility org.apache.hadoop.hbase.HTop > No need necessarily for a curses interface. Could be an external monitor with > a web front end as has been discussed before. I do like the idea of a process > that runs in a terminal because I interact with dev and test HBase clusters > exclusively by SSH. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
[ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848202#comment-16848202 ] Hudson commented on HBASE-21879: Results for branch HBASE-21879 [build #113 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Read HFile's block to ByteBuffer directly instead of to byte for reducing > young gc purpose > -- > > Key: HBASE-21879 > URL: https://issues.apache.org/jira/browse/HBASE-21879 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, > QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png > > > In HFileBlock#readBlockDataInternal, we have the following: > {code} > @VisibleForTesting > protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, > long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, > boolean updateMetrics) > throws IOException { > // . > // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with > BBPool (offheap). > byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; > int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, > onDiskSizeWithHeader - preReadHeaderSize, true, offset + > preReadHeaderSize, pread); > if (headerBuf != null) { > // ... > } > // ... > } > {code} > In the read path, we still read the block from hfile to on-heap byte[], then > copy the on-heap byte[] to offheap bucket cache asynchronously, and in my > 100% get performance test, I also observed some frequent young gc, The > largest memory footprint in the young gen should be the on-heap block byte[]. > In fact, we can read HFile's block to ByteBuffer directly instead of to > byte[] for reducing young gc purpose. we did not implement this before, > because no ByteBuffer reading interface in the older HDFS client, but 2.7+ > has supported this now, so we can fix this now. I think. > Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
[ https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848190#comment-16848190 ] Hudson commented on HBASE-21512: Results for branch HBASE-21512 [build #241 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Introduce an AsyncClusterConnection and replace the usage of ClusterConnection > -- > > Key: HBASE-21512 > URL: https://issues.apache.org/jira/browse/HBASE-21512 > Project: HBase > Issue Type: Umbrella >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > > At least for the RSProcedureDispatcher, with CompletableFuture we do not need > to set a delay and use a thread pool any more, which could reduce the > resource usage and also the latency. > Once this is done, I think we can remove the ClusterConnection completely, > and start to rewrite the old sync client based on the async client, which > could reduce the code base a lot for our client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22455) Split TestReplicationStatus
[ https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848162#comment-16848162 ] Hudson commented on HBASE-22455: Results for branch branch-2 [build #1916 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Split TestReplicationStatus > --- > > Key: HBASE-22455 > URL: https://issues.apache.org/jira/browse/HBASE-22455 > Project: HBase > Issue Type: Improvement > Components: Replication, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > The test is a bit strange, we restart the cluster every time when running a > test method, and even more, we always shutdown the mini clusters and then > restart them in the beginning of each test method... > Let's just make one class for each test method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
[ https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848155#comment-16848155 ] Hudson commented on HBASE-21512: Results for branch HBASE-21512 [build #240 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Introduce an AsyncClusterConnection and replace the usage of ClusterConnection > -- > > Key: HBASE-21512 > URL: https://issues.apache.org/jira/browse/HBASE-21512 > Project: HBase > Issue Type: Umbrella >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > > At least for the RSProcedureDispatcher, with CompletableFuture we do not need > to set a delay and use a thread pool any more, which could reduce the > resource usage and also the latency. > Once this is done, I think we can remove the ClusterConnection completely, > and start to rewrite the old sync client based on the async client, which > could reduce the code base a lot for our client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22455) Split TestReplicationStatus
[ https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848151#comment-16848151 ] Hudson commented on HBASE-22455: Results for branch master [build #1035 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1035/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Split TestReplicationStatus > --- > > Key: HBASE-22455 > URL: https://issues.apache.org/jira/browse/HBASE-22455 > Project: HBase > Issue Type: Improvement > Components: Replication, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > The test is a bit strange, we restart the cluster every time when running a > test method, and even more, we always shutdown the mini clusters and then > restart them in the beginning of each test method... > Let's just make one class for each test method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (HBASE-22470) Corrupt Surefire test reports
[ https://issues.apache.org/jira/browse/HBASE-22470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-22470 started by Peter Somogyi. - > Corrupt Surefire test reports > - > > Key: HBASE-22470 > URL: https://issues.apache.org/jira/browse/HBASE-22470 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.2.0, 2.1.5 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Minor > Attachments: > TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml > > > Jenkins is not able to read surefire test reports occasionally because the > generated XML file is corrupted. In this case Jenkins shows the following > error message: > TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read] > https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/ > {noformat} > Failed to read test report file > /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml > org.dom4j.DocumentException: Error on line 86 of document : XML document > structures must start and end within the same entity. Nested exception: XML > document structures must start and end within the same entity.{noformat} > The specific XML file is not complete, however, the output file for the test > contains stdout and stderr output. > {noformat} > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="95.334"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="26.5"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="27.244"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="46.921"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="43.147"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="11.119"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="44.022"> > type="java.lang.AssertionError">java.lang.AssertionError: Waited too much > time for bulkloaded data replication. Current count=200, expected count=600 > at > org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641) > at > org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631) > at > org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371) > >
[jira] [Commented] (HBASE-22470) Corrupt Surefire test reports
[ https://issues.apache.org/jira/browse/HBASE-22470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848092#comment-16848092 ] Peter Somogyi commented on HBASE-22470: --- The current surefire version is 2.22.0 that HBase inherits from ASF parent pom. Surefire 2.22.1 had some bugfixes around XML reports and outputs (SUREFIRE-1559, SUREFIRE-1579, SUREFIRE-1561), but it is possible that test prints some special characters that surefire is not able to write to the XML or the size of the output is too big (7.9MB in this case). > Corrupt Surefire test reports > - > > Key: HBASE-22470 > URL: https://issues.apache.org/jira/browse/HBASE-22470 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.2.0, 2.1.5 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Minor > Attachments: > TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml > > > Jenkins is not able to read surefire test reports occasionally because the > generated XML file is corrupted. In this case Jenkins shows the following > error message: > TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read] > https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/ > {noformat} > Failed to read test report file > /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml > org.dom4j.DocumentException: Error on line 86 of document : XML document > structures must start and end within the same entity. Nested exception: XML > document structures must start and end within the same entity.{noformat} > The specific XML file is not complete, however, the output file for the test > contains stdout and stderr output. > {noformat} > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="95.334"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="26.5"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="27.244"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="46.921"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="43.147"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="11.119"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="44.022"> > type="java.lang.AssertionError">java.lang.AssertionError: Waited too much > time for bulkloaded data replication. Current count=200, expected count=600 > at > org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641) > at > org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631) > at > org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371) > >
[jira] [Updated] (HBASE-22470) Corrupt Surefire test reports
[ https://issues.apache.org/jira/browse/HBASE-22470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi updated HBASE-22470: -- Attachment: TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml > Corrupt Surefire test reports > - > > Key: HBASE-22470 > URL: https://issues.apache.org/jira/browse/HBASE-22470 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.2.0, 2.1.5 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Minor > Attachments: > TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml > > > Jenkins is not able to read surefire test reports occasionally because the > generated XML file is corrupted. In this case Jenkins shows the following > error message: > TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read] > https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/ > {noformat} > Failed to read test report file > /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml > org.dom4j.DocumentException: Error on line 86 of document : XML document > structures must start and end within the same entity. Nested exception: XML > document structures must start and end within the same entity.{noformat} > The specific XML file is not complete, however, the output file for the test > contains stdout and stderr output. > {noformat} > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="95.334"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="26.5"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="27.244"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="46.921"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="43.147"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="11.119"/> > classname="org.apache.hadoop.hbase.replication.TestMasterReplication" > time="44.022"> > type="java.lang.AssertionError">java.lang.AssertionError: Waited too much > time for bulkloaded data replication. Current count=200, expected count=600 > at > org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641) > at > org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631) > at > org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371) > >
[jira] [Created] (HBASE-22470) Corrupt Surefire test reports
Peter Somogyi created HBASE-22470: - Summary: Corrupt Surefire test reports Key: HBASE-22470 URL: https://issues.apache.org/jira/browse/HBASE-22470 Project: HBase Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.2.0, 2.1.5 Reporter: Peter Somogyi Assignee: Peter Somogyi Jenkins is not able to read surefire test reports occasionally because the generated XML file is corrupted. In this case Jenkins shows the following error message: TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read] https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/ {noformat} Failed to read test report file /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml org.dom4j.DocumentException: Error on line 86 of document : XML document structures must start and end within the same entity. Nested exception: XML document structures must start and end within the same entity.{noformat} The specific XML file is not complete, however, the output file for the test contains stdout and stderr output. {noformat} java.lang.AssertionError: Waited too much time for bulkloaded data replication. Current count=200, expected count=600 at org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641) at org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631) at org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371)
[jira] [Commented] (HBASE-22346) scanner priorities/deadline units are invalid for non-huge scanners
[ https://issues.apache.org/jira/browse/HBASE-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848074#comment-16848074 ] Hudson commented on HBASE-22346: Results for branch HBASE-22346 [build #20 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > scanner priorities/deadline units are invalid for non-huge scanners > --- > > Key: HBASE-22346 > URL: https://issues.apache.org/jira/browse/HBASE-22346 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-22346.01.patch, HBASE-22346.patch > > > I was looking at using the priority (deadline) queue for scanner requests; > what I see is that AnnotationReadingPriorityFunction, the only impl of the > deadline function available, implements getDeadline as sqrt of the number of > next() calls, from HBASE-10993. > However, CallPriorityComparator.compare, its only caller, adds that > "deadline" value to the callA.getReceiveTime() in milliseconds... > That results in some sort of a meaningless value that I assume only make > sense "by coincidence" for telling apart broad and specific classes of > scanners... in practice next calls must be in the 1000s before it becomes > meaningful vs small differences in ReceivedTime > When there's contention from many scanners, e.g. small scanners for meta, or > just users creating tons of scanners to the point where requests queue up, > the actual deadline is not accounted for and the priority function itself is > meaningless... In fact as queueing increases, it becomes worse because > receivedtime differences grow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22455) Split TestReplicationStatus
[ https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-22455. --- Resolution: Fixed Hadoop Flags: Reviewed Pushed to master and branch-2. Thanks [~openinx] for reviewing. > Split TestReplicationStatus > --- > > Key: HBASE-22455 > URL: https://issues.apache.org/jira/browse/HBASE-22455 > Project: HBase > Issue Type: Improvement > Components: Replication, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > The test is a bit strange, we restart the cluster every time when running a > test method, and even more, we always shutdown the mini clusters and then > restart them in the beginning of each test method... > Let's just make one class for each test method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22455) Split TestReplicationStatus
[ https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-22455: -- Fix Version/s: 2.3.0 3.0.0 > Split TestReplicationStatus > --- > > Key: HBASE-22455 > URL: https://issues.apache.org/jira/browse/HBASE-22455 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > The test is a bit strange, we restart the cluster every time when running a > test method, and even more, we always shutdown the mini clusters and then > restart them in the beginning of each test method... > Let's just make one class for each test method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22455) Split TestReplicationStatus
[ https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-22455: -- Component/s: test Replication > Split TestReplicationStatus > --- > > Key: HBASE-22455 > URL: https://issues.apache.org/jira/browse/HBASE-22455 > Project: HBase > Issue Type: Improvement > Components: Replication, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > The test is a bit strange, we restart the cluster every time when running a > test method, and even more, we always shutdown the mini clusters and then > restart them in the beginning of each test method... > Let's just make one class for each test method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [hbase] Apache9 merged pull request #249: HBASE-22455 Split TestReplicationStatus
Apache9 merged pull request #249: HBASE-22455 Split TestReplicationStatus URL: https://github.com/apache/hbase/pull/249 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services