[jira] [Commented] (HBASE-22471) Our nightly jobs for master and branch-2 are still using hadoop-2.7.1 in integration test

2019-05-25 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848345#comment-16848345
 ] 

Duo Zhang commented on HBASE-22471:
---

[~busbey] Haven't find the way to clean the workspace for our nightly jobs...

> Our nightly jobs for master and branch-2 are still using hadoop-2.7.1 in 
> integration test
> -
>
> Key: HBASE-22471
> URL: https://issues.apache.org/jira/browse/HBASE-22471
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: Duo Zhang
>Priority: Major
>
> We use ls to get the hadoop 2 jars, so maybe the problem is that the 2.7.1 
> jars are already there for a long time. We need to clean the workspace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22471) Our nightly jobs for master and branch-2 are still using hadoop-2.7.1 in integration test

2019-05-25 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-22471:
-

 Summary: Our nightly jobs for master and branch-2 are still using 
hadoop-2.7.1 in integration test
 Key: HBASE-22471
 URL: https://issues.apache.org/jira/browse/HBASE-22471
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: Duo Zhang


We use ls to get the hadoop 2 jars, so maybe the problem is that the 2.7.1 jars 
are already there for a long time. We need to clean the workspace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache

2019-05-25 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22422:
-
Status: Patch Available  (was: Open)

> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> 
>
> Key: HBASE-22422
> URL: https://issues.apache.org/jira/browse/HBASE-22422
> Project: HBase
>  Issue Type: Sub-task
>  Components: BlockCache
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, 
> 0001-debug3.patch, 0001-debug4.patch, 
> HBASE-22422-qps-after-fix-the-zero-retain-bug.png, 
> HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, 
> LRUBlockCache-getBlock.png, debug.patch, 
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster,  we found the get 
> QPS dropped from  25000/s to hunderds per second in a cluster with five 
> nodes.  
> After enable the debug log at YCSB client side,  I found the following 
> stacktrace , see 
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>  
> After looking into the stractrace, I can ensure that the zero refCnt block is 
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache

2019-05-25 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22422:
-
Attachment: HBASE-22422.HBASE-21879.v02.patch

> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> 
>
> Key: HBASE-22422
> URL: https://issues.apache.org/jira/browse/HBASE-22422
> Project: HBase
>  Issue Type: Sub-task
>  Components: BlockCache
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, 
> 0001-debug3.patch, 0001-debug4.patch, 
> HBASE-22422-qps-after-fix-the-zero-retain-bug.png, 
> HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, 
> LRUBlockCache-getBlock.png, debug.patch, 
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster,  we found the get 
> QPS dropped from  25000/s to hunderds per second in a cluster with five 
> nodes.  
> After enable the debug log at YCSB client side,  I found the following 
> stacktrace , see 
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>  
> After looking into the stractrace, I can ensure that the zero refCnt block is 
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848320#comment-16848320
 ] 

Hudson commented on HBASE-21991:


Results for branch branch-2.1
[build #1191 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1191//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Fix MetaMetrics issues - [Race condition, Faulty remove logic], few 
> improvements
> 
>
> Key: HBASE-21991
> URL: https://issues.apache.org/jira/browse/HBASE-21991
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, metrics
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.5
>
> Attachments: hbase-21991.addendum.patch, 
> hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, 
> hbase-21991.master.001.patch, hbase-21991.master.002.patch, 
> hbase-21991.master.003.patch, hbase-21991.master.004.patch, 
> hbase-21991.master.005.patch, hbase-21991.master.006.patch
>
>
> Here is a list of the issues related to the MetaMetrics implementation:
> +*Bugs*+:
>  # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: 
> Under certain conditions, we might end up storing/exposing all the meters 
> rather than top-k-ish
>  # MetaMetrics can throw NPE resulting in aborting of the RS because of a 
> *Race Condition*.
> +*Improvements*+:
>  # With high number of regions in the cluster, exposure of metrics for each 
> region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of 
> regions. It's better to use *lossy counting to maintain top-k for region 
> metrics* as well.
>  # As the lossy meters do not represent actual counts, I think, it'll be 
> better to *rename the meters to include "lossy" in the name*. It would be 
> more informative while monitoring the metrics and there would be less 
> confusion regarding actual counts to lossy counts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848319#comment-16848319
 ] 

Hudson commented on HBASE-21991:


Results for branch branch-2.2
[build #287 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Fix MetaMetrics issues - [Race condition, Faulty remove logic], few 
> improvements
> 
>
> Key: HBASE-21991
> URL: https://issues.apache.org/jira/browse/HBASE-21991
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, metrics
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.5
>
> Attachments: hbase-21991.addendum.patch, 
> hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, 
> hbase-21991.master.001.patch, hbase-21991.master.002.patch, 
> hbase-21991.master.003.patch, hbase-21991.master.004.patch, 
> hbase-21991.master.005.patch, hbase-21991.master.006.patch
>
>
> Here is a list of the issues related to the MetaMetrics implementation:
> +*Bugs*+:
>  # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: 
> Under certain conditions, we might end up storing/exposing all the meters 
> rather than top-k-ish
>  # MetaMetrics can throw NPE resulting in aborting of the RS because of a 
> *Race Condition*.
> +*Improvements*+:
>  # With high number of regions in the cluster, exposure of metrics for each 
> region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of 
> regions. It's better to use *lossy counting to maintain top-k for region 
> metrics* as well.
>  # As the lossy meters do not represent actual counts, I think, it'll be 
> better to *rename the meters to include "lossy" in the name*. It would be 
> more informative while monitoring the metrics and there would be less 
> confusion regarding actual counts to lossy counts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848318#comment-16848318
 ] 

Hudson commented on HBASE-21800:


Results for branch branch-2.2
[build #287 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/287//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> RegionServer aborted due to NPE from MetaTableMetrics coprocessor
> -
>
> Key: HBASE-21800
> URL: https://issues.apache.org/jira/browse/HBASE-21800
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, meta, metrics, Operability
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Critical
>  Labels: Meta
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.4
>
> Attachments: hbase-21800.branch-1.001.patch, 
> hbase-21800.branch-1.002.patch, hbase-21800.branch-1.003.patch, 
> hbase-21800.branch-1.004.patch, hbase-21800.master.001.patch, 
> hbase-21800.master.002.patch, hbase-21800.master.003.patch
>
>
> I was just playing around the code, trying to capture "Top k" table metrics 
> from MetaMetrics, when I bumped into this issue. Though currently we are not 
> capturing "Top K" table metrics, but we can encounter this issue because of 
> the "Top k Clients" that is implemented using the LossyAlgo.
>  
> RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The 
> log looks somewhat like this:
> {code:java}
> 2019-01-28 23:31:10,311 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> coprocessor.CoprocessorHost: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> 2019-01-28 23:31:10,314 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.HRegionServer: * ABORTING region server 
> 10.0.0.24,16020,1548747043814: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException *
> java.lang.NullPointerException
>   at 
> 

[jira] [Resolved] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements

2019-05-25 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi resolved HBASE-21991.
---
   Resolution: Fixed
 Hadoop Flags: Incompatible change
Fix Version/s: 2.1.5

Pushed addendum to branch-2.1 and main patch to branch-2.2.

FYI [~zghaobac], [~stack]

> Fix MetaMetrics issues - [Race condition, Faulty remove logic], few 
> improvements
> 
>
> Key: HBASE-21991
> URL: https://issues.apache.org/jira/browse/HBASE-21991
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, metrics
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.5
>
> Attachments: hbase-21991.addendum.patch, 
> hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, 
> hbase-21991.master.001.patch, hbase-21991.master.002.patch, 
> hbase-21991.master.003.patch, hbase-21991.master.004.patch, 
> hbase-21991.master.005.patch, hbase-21991.master.006.patch
>
>
> Here is a list of the issues related to the MetaMetrics implementation:
> +*Bugs*+:
>  # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: 
> Under certain conditions, we might end up storing/exposing all the meters 
> rather than top-k-ish
>  # MetaMetrics can throw NPE resulting in aborting of the RS because of a 
> *Race Condition*.
> +*Improvements*+:
>  # With high number of regions in the cluster, exposure of metrics for each 
> region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of 
> regions. It's better to use *lossy counting to maintain top-k for region 
> metrics* as well.
>  # As the lossy meters do not represent actual counts, I think, it'll be 
> better to *rename the meters to include "lossy" in the name*. It would be 
> more informative while monitoring the metrics and there would be less 
> confusion regarding actual counts to lossy counts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor

2019-05-25 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi resolved HBASE-21800.
---
   Resolution: Fixed
Fix Version/s: 2.1.4

Pushed to branch-2.2.

FYI [~zghaobac]

> RegionServer aborted due to NPE from MetaTableMetrics coprocessor
> -
>
> Key: HBASE-21800
> URL: https://issues.apache.org/jira/browse/HBASE-21800
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, meta, metrics, Operability
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Critical
>  Labels: Meta
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0, 2.1.4
>
> Attachments: hbase-21800.branch-1.001.patch, 
> hbase-21800.branch-1.002.patch, hbase-21800.branch-1.003.patch, 
> hbase-21800.branch-1.004.patch, hbase-21800.master.001.patch, 
> hbase-21800.master.002.patch, hbase-21800.master.003.patch
>
>
> I was just playing around the code, trying to capture "Top k" table metrics 
> from MetaMetrics, when I bumped into this issue. Though currently we are not 
> capturing "Top K" table metrics, but we can encounter this issue because of 
> the "Top k Clients" that is implemented using the LossyAlgo.
>  
> RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The 
> log looks somewhat like this:
> {code:java}
> 2019-01-28 23:31:10,311 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> coprocessor.CoprocessorHost: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> 2019-01-28 23:31:10,314 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.HRegionServer: * ABORTING region server 
> 10.0.0.24,16020,1548747043814: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException *
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547)
>   at 
> 

[jira] [Reopened] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor

2019-05-25 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi reopened HBASE-21800:
---

Reopening for pushing missing commit to branch-2.2.

> RegionServer aborted due to NPE from MetaTableMetrics coprocessor
> -
>
> Key: HBASE-21800
> URL: https://issues.apache.org/jira/browse/HBASE-21800
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, meta, metrics, Operability
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Critical
>  Labels: Meta
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0
>
> Attachments: hbase-21800.branch-1.001.patch, 
> hbase-21800.branch-1.002.patch, hbase-21800.branch-1.003.patch, 
> hbase-21800.branch-1.004.patch, hbase-21800.master.001.patch, 
> hbase-21800.master.002.patch, hbase-21800.master.003.patch
>
>
> I was just playing around the code, trying to capture "Top k" table metrics 
> from MetaMetrics, when I bumped into this issue. Though currently we are not 
> capturing "Top K" table metrics, but we can encounter this issue because of 
> the "Top k Clients" that is implemented using the LossyAlgo.
>  
> RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The 
> log looks somewhat like this:
> {code:java}
> 2019-01-28 23:31:10,311 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> coprocessor.CoprocessorHost: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> 2019-01-28 23:31:10,314 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.HRegionServer: * ABORTING region server 
> 10.0.0.24,16020,1548747043814: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException *
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547)
>   at 
> 

[jira] [Reopened] (HBASE-21991) Fix MetaMetrics issues - [Race condition, Faulty remove logic], few improvements

2019-05-25 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi reopened HBASE-21991:
---

Reopening to push missing base patch to branch-2.2 and addendum to branch-2.1.

> Fix MetaMetrics issues - [Race condition, Faulty remove logic], few 
> improvements
> 
>
> Key: HBASE-21991
> URL: https://issues.apache.org/jira/browse/HBASE-21991
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, metrics
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.3.0
>
> Attachments: hbase-21991.addendum.patch, 
> hbase-21991.branch-1.001.patch, hbase-21991.branch-1.002.patch, 
> hbase-21991.master.001.patch, hbase-21991.master.002.patch, 
> hbase-21991.master.003.patch, hbase-21991.master.004.patch, 
> hbase-21991.master.005.patch, hbase-21991.master.006.patch
>
>
> Here is a list of the issues related to the MetaMetrics implementation:
> +*Bugs*+:
>  # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: 
> Under certain conditions, we might end up storing/exposing all the meters 
> rather than top-k-ish
>  # MetaMetrics can throw NPE resulting in aborting of the RS because of a 
> *Race Condition*.
> +*Improvements*+:
>  # With high number of regions in the cluster, exposure of metrics for each 
> region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of 
> regions. It's better to use *lossy counting to maintain top-k for region 
> metrics* as well.
>  # As the lossy meters do not represent actual counts, I think, it'll be 
> better to *rename the meters to include "lossy" in the name*. It would be 
> more informative while monitoring the metrics and there would be less 
> confusion regarding actual counts to lossy counts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-11062) htop

2019-05-25 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848258#comment-16848258
 ] 

Toshihiro Suzuki commented on HBASE-11062:
--

+1 for hbtop.

[~elserj]
{quote}
You have a patch/PR for what you demo'ed (even with the LGPL dependency)?
{quote}
Not yet. I need to do some refactoring of the code and add some tests. Maybe, I 
can create a PR for it next week or in 2 weeks. 

> htop
> 
>
> Key: HBASE-11062
> URL: https://issues.apache.org/jira/browse/HBASE-11062
> Project: HBase
>  Issue Type: New Feature
>  Components: hbase-operator-tools
>Reporter: Andrew Purtell
>Assignee: Toshihiro Suzuki
>Priority: Major
>
> A top-like monitor could be useful for testing, debugging, operations of 
> clusters of moderate size, and possibly for diagnosing issues in large 
> clusters.
> Consider a curses interface like the one presented by atop 
> (http://www.atoptool.nl/images/screenshots/genericw.png) - with aggregate 
> metrics collected over a monitoring interval in the upper portion of the 
> pane, and a listing of discrete measurements sorted and filtered by various 
> criteria in the bottom part of the pane. One might imagine a cluster overview 
> with cluster aggregate metrics above and a list of regionservers sorted by 
> utilization below; and a regionserver view with process metrics above and a 
> list of metrics by operation type below, or a list of client connections, or 
> a list of threads, sorted by utilization, throughput, or latency. 
> Generically 'htop' is taken but would be distinctive in the HBase context, a 
> utility org.apache.hadoop.hbase.HTop
> No need necessarily for a curses interface. Could be an external monitor with 
> a web front end as has been discussed before. I do like the idea of a process 
> that runs in a terminal because I interact with dev and test HBase clusters 
> exclusively by SSH. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848202#comment-16848202
 ] 

Hudson commented on HBASE-21879:


Results for branch HBASE-21879
[build #113 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21879/113//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, 
> QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848190#comment-16848190
 ] 

Hudson commented on HBASE-21512:


Results for branch HBASE-21512
[build #241 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/241//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
> --
>
> Key: HBASE-21512
> URL: https://issues.apache.org/jira/browse/HBASE-21512
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
>
> At least for the RSProcedureDispatcher, with CompletableFuture we do not need 
> to set a delay and use a thread pool any more, which could reduce the 
> resource usage and also the latency.
> Once this is done, I think we can remove the ClusterConnection completely, 
> and start to rewrite the old sync client based on the async client, which 
> could reduce the code base a lot for our client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22455) Split TestReplicationStatus

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848162#comment-16848162
 ] 

Hudson commented on HBASE-22455:


Results for branch branch-2
[build #1916 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1916//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Split TestReplicationStatus
> ---
>
> Key: HBASE-22455
> URL: https://issues.apache.org/jira/browse/HBASE-22455
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> The test is a bit strange, we restart the cluster every time when running a 
> test method, and even more, we always shutdown the mini clusters and then 
> restart them in the beginning of each test method...
> Let's just make one class for each test method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848155#comment-16848155
 ] 

Hudson commented on HBASE-21512:


Results for branch HBASE-21512
[build #240 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/240//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
> --
>
> Key: HBASE-21512
> URL: https://issues.apache.org/jira/browse/HBASE-21512
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
>
> At least for the RSProcedureDispatcher, with CompletableFuture we do not need 
> to set a delay and use a thread pool any more, which could reduce the 
> resource usage and also the latency.
> Once this is done, I think we can remove the ClusterConnection completely, 
> and start to rewrite the old sync client based on the async client, which 
> could reduce the code base a lot for our client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22455) Split TestReplicationStatus

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848151#comment-16848151
 ] 

Hudson commented on HBASE-22455:


Results for branch master
[build #1035 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1035/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(x) {color:red}-1 client integration test{color}
--Failed when running client tests on top of Hadoop 2. [see log for 
details|https://builds.apache.org/job/HBase%20Nightly/job/master/1035//artifact/output-integration/hadoop-2.log].
 (note that this means we didn't run on Hadoop 3)


> Split TestReplicationStatus
> ---
>
> Key: HBASE-22455
> URL: https://issues.apache.org/jira/browse/HBASE-22455
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> The test is a bit strange, we restart the cluster every time when running a 
> test method, and even more, we always shutdown the mini clusters and then 
> restart them in the beginning of each test method...
> Let's just make one class for each test method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-22470) Corrupt Surefire test reports

2019-05-25 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-22470 started by Peter Somogyi.
-
> Corrupt Surefire test reports
> -
>
> Key: HBASE-22470
> URL: https://issues.apache.org/jira/browse/HBASE-22470
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.2.0, 2.1.5
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Attachments: 
> TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml
>
>
> Jenkins is not able to read surefire test reports occasionally because the 
> generated XML file is corrupted. In this case Jenkins shows the following 
> error message:
> TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read]
> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/
> {noformat}
> Failed to read test report file 
> /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml
> org.dom4j.DocumentException: Error on line 86 of document  : XML document 
> structures must start and end within the same entity. Nested exception: XML 
> document structures must start and end within the same entity.{noformat}
> The specific XML file is not complete, however, the output file for the test 
> contains stdout and stderr output.
> {noformat}
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="95.334"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="26.5"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="27.244"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="46.921"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="43.147"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="11.119"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="44.022">
>  type="java.lang.AssertionError">java.lang.AssertionError: Waited too much 
> time for bulkloaded data replication. Current count=200, expected count=600
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641)
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631)
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371)
> 
> 

[jira] [Commented] (HBASE-22470) Corrupt Surefire test reports

2019-05-25 Thread Peter Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848092#comment-16848092
 ] 

Peter Somogyi commented on HBASE-22470:
---

The current surefire version is 2.22.0 that HBase inherits from ASF parent pom. 
Surefire 2.22.1 had some bugfixes around XML reports and outputs 
(SUREFIRE-1559, SUREFIRE-1579, SUREFIRE-1561), but it is possible that test 
prints some special characters that surefire is not able to write to the XML or 
the size of the output is too big (7.9MB in this case).

> Corrupt Surefire test reports
> -
>
> Key: HBASE-22470
> URL: https://issues.apache.org/jira/browse/HBASE-22470
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.2.0, 2.1.5
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Attachments: 
> TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml
>
>
> Jenkins is not able to read surefire test reports occasionally because the 
> generated XML file is corrupted. In this case Jenkins shows the following 
> error message:
> TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read]
> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/
> {noformat}
> Failed to read test report file 
> /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml
> org.dom4j.DocumentException: Error on line 86 of document  : XML document 
> structures must start and end within the same entity. Nested exception: XML 
> document structures must start and end within the same entity.{noformat}
> The specific XML file is not complete, however, the output file for the test 
> contains stdout and stderr output.
> {noformat}
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="95.334"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="26.5"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="27.244"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="46.921"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="43.147"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="11.119"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="44.022">
>  type="java.lang.AssertionError">java.lang.AssertionError: Waited too much 
> time for bulkloaded data replication. Current count=200, expected count=600
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641)
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631)
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371)
> 
> 

[jira] [Updated] (HBASE-22470) Corrupt Surefire test reports

2019-05-25 Thread Peter Somogyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-22470:
--
Attachment: 
TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml

> Corrupt Surefire test reports
> -
>
> Key: HBASE-22470
> URL: https://issues.apache.org/jira/browse/HBASE-22470
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.2.0, 2.1.5
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Attachments: 
> TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml
>
>
> Jenkins is not able to read surefire test reports occasionally because the 
> generated XML file is corrupted. In this case Jenkins shows the following 
> error message:
> TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read]
> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/
> {noformat}
> Failed to read test report file 
> /home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml
> org.dom4j.DocumentException: Error on line 86 of document  : XML document 
> structures must start and end within the same entity. Nested exception: XML 
> document structures must start and end within the same entity.{noformat}
> The specific XML file is not complete, however, the output file for the test 
> contains stdout and stderr output.
> {noformat}
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="95.334"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="26.5"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="27.244"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="46.921"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="43.147"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="11.119"/>
>  classname="org.apache.hadoop.hbase.replication.TestMasterReplication" 
> time="44.022">
>  type="java.lang.AssertionError">java.lang.AssertionError: Waited too much 
> time for bulkloaded data replication. Current count=200, expected count=600
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641)
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631)
> at 
> org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371)
> 
> 

[jira] [Created] (HBASE-22470) Corrupt Surefire test reports

2019-05-25 Thread Peter Somogyi (JIRA)
Peter Somogyi created HBASE-22470:
-

 Summary: Corrupt Surefire test reports
 Key: HBASE-22470
 URL: https://issues.apache.org/jira/browse/HBASE-22470
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0, 2.1.5
Reporter: Peter Somogyi
Assignee: Peter Somogyi


Jenkins is not able to read surefire test reports occasionally because the 
generated XML file is corrupted. In this case Jenkins shows the following error 
message:

TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml.[failed-to-read]

https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/1176/testReport/junit/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication/xml/_failed_to_read_/

{noformat}
Failed to read test report file 
/home/jenkins/jenkins-slave/workspace/HBase_Nightly_branch-2.1/output-jdk8-hadoop3/archiver/hbase-server/target/surefire-reports/TEST-org.apache.hadoop.hbase.replication.TestMasterReplication.xml
org.dom4j.DocumentException: Error on line 86 of document  : XML document 
structures must start and end within the same entity. Nested exception: XML 
document structures must start and end within the same entity.{noformat}
The specific XML file is not complete, however, the output file for the test 
contains stdout and stderr output.
{noformat}







java.lang.AssertionError: Waited too much time 
for bulkloaded data replication. Current count=200, expected count=600
at 
org.apache.hadoop.hbase.replication.TestMasterReplication.wait(TestMasterReplication.java:641)
at 
org.apache.hadoop.hbase.replication.TestMasterReplication.loadAndValidateHFileReplication(TestMasterReplication.java:631)
at 
org.apache.hadoop.hbase.replication.TestMasterReplication.testHFileMultiSlaveReplication(TestMasterReplication.java:371)



[jira] [Commented] (HBASE-22346) scanner priorities/deadline units are invalid for non-huge scanners

2019-05-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848074#comment-16848074
 ] 

Hudson commented on HBASE-22346:


Results for branch HBASE-22346
[build #20 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22346/20//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> scanner priorities/deadline units are invalid for non-huge scanners
> ---
>
> Key: HBASE-22346
> URL: https://issues.apache.org/jira/browse/HBASE-22346
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-22346.01.patch, HBASE-22346.patch
>
>
> I was looking at using the priority (deadline) queue for scanner requests; 
> what I see is that AnnotationReadingPriorityFunction, the only impl of the 
> deadline function available, implements getDeadline as sqrt of the number of 
> next() calls, from HBASE-10993.
> However, CallPriorityComparator.compare, its only caller, adds that 
> "deadline" value to the callA.getReceiveTime() in milliseconds...
> That results in some sort of a meaningless value that I assume only make 
> sense "by coincidence" for telling apart broad and specific classes of 
> scanners... in practice next calls must be in the 1000s before it becomes 
> meaningful vs small differences in ReceivedTime
> When there's contention from many scanners, e.g. small scanners for meta, or 
> just users creating tons of scanners to the point where requests queue up, 
> the actual deadline is not accounted for and the priority function itself is 
> meaningless... In fact as queueing increases, it becomes worse because 
> receivedtime differences grow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-22455) Split TestReplicationStatus

2019-05-25 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-22455.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to master and branch-2.

Thanks [~openinx] for reviewing.

> Split TestReplicationStatus
> ---
>
> Key: HBASE-22455
> URL: https://issues.apache.org/jira/browse/HBASE-22455
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> The test is a bit strange, we restart the cluster every time when running a 
> test method, and even more, we always shutdown the mini clusters and then 
> restart them in the beginning of each test method...
> Let's just make one class for each test method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22455) Split TestReplicationStatus

2019-05-25 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22455:
--
Fix Version/s: 2.3.0
   3.0.0

> Split TestReplicationStatus
> ---
>
> Key: HBASE-22455
> URL: https://issues.apache.org/jira/browse/HBASE-22455
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> The test is a bit strange, we restart the cluster every time when running a 
> test method, and even more, we always shutdown the mini clusters and then 
> restart them in the beginning of each test method...
> Let's just make one class for each test method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22455) Split TestReplicationStatus

2019-05-25 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-22455:
--
Component/s: test
 Replication

> Split TestReplicationStatus
> ---
>
> Key: HBASE-22455
> URL: https://issues.apache.org/jira/browse/HBASE-22455
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> The test is a bit strange, we restart the cluster every time when running a 
> test method, and even more, we always shutdown the mini clusters and then 
> restart them in the beginning of each test method...
> Let's just make one class for each test method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [hbase] Apache9 merged pull request #249: HBASE-22455 Split TestReplicationStatus

2019-05-25 Thread GitBox
Apache9 merged pull request #249: HBASE-22455 Split TestReplicationStatus
URL: https://github.com/apache/hbase/pull/249
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services