date:20181206

[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

2018-12-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712419#comment-16712419
 ] 

Hadoop QA commented on HBASE-21564:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
44s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 49s{color} 
| {color:red} hbase-server generated 6 new + 182 unchanged - 6 fixed = 188 
total (was 188) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
3s{color} | {color:red} hbase-server: The patch generated 6 new + 1 unchanged - 
0 fixed = 7 total (was 1) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
50s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 15s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}148m 12s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m  5s{color} 
| {color:red} hbase-backup in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 0s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}241m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.replication.TestTransitPeerSyncReplicationStateProcedureRetry
 |
|   | hadoop.hbase.fs.TestBlockReorderMultiBlocks |
|   | hadoop.hbase.client.replication.TestReplicationAdmin |
|   | hadoop.hbase.replication.TestSyncReplicationStandBy |
|   | hadoop.hbase.replication.TestReplicationSmallTestsSync |
|   | hadoop.hbase.replication.TestSerialSyncReplication |
|   | hadoop.hbase.replication.TestReplicationChangingPeerRegionservers |
|   | hadoop.hbase.replication.TestReplicationSmallTests |
|   | hadoop.hbase.client.TestAsyncClusterAdminApi |
|   | hadoop.hbase.replication.TestAddToSerialReplicationPeer |
|   | hadoop.hbase.replication.

[jira] [Updated] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-06 Thread Xu Cang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21553:

Attachment: HBASE-21553-branch-1.001.patch
Status: Patch Available  (was: Open)

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-21553-branch-1.001.patch
>
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.2RC0

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21566.
---
   Resolution: Fixed
Fix Version/s: 2.1.2

Pushed the appropriate changes and releasenotes on branch-2.0 and branch-2.1.

> Release notes and changes for 2.0.4RC0 and 2.1.2RC0
> ---
>
> Key: HBASE-21566
> URL: https://issues.apache.org/jira/browse/HBASE-21566
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.2, 2.0.4
>
>
> $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l 
> --sortorder=newer --skip-credits
> $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.2 -l 
> --sortorder=newer --skip-credits
> ... using yetus tagged 0.8.0
> ...then carefully stitched the product into the current CHANGES.md and 
> RELEASENOTES.md files being careful to preserve markdown header ABOVE the 
> apache license else the .md files won't render as markdown 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.2RC0

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21566:
--
Summary: Release notes and changes for 2.0.4RC0 and 2.1.2RC0  (was: Release 
notes and changes for 2.0.4RC0 and 2.1.1RC0)

> Release notes and changes for 2.0.4RC0 and 2.1.2RC0
> ---
>
> Key: HBASE-21566
> URL: https://issues.apache.org/jira/browse/HBASE-21566
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.4
>
>
> $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l 
> --sortorder=newer --skip-credits
> $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.1 -l 
> --sortorder=newer --skip-credits
> ... using yetus tagged 0.8.0
> ...then carefully stitched the product into the current CHANGES.md and 
> RELEASENOTES.md files being careful to preserve markdown header ABOVE the 
> apache license else the .md files won't render as markdown 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.2RC0

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21566:
--
Description: 
$ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l 
--sortorder=newer --skip-credits

$ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.2 -l 
--sortorder=newer --skip-credits

... using yetus tagged 0.8.0

...then carefully stitched the product into the current CHANGES.md and 
RELEASENOTES.md files being careful to preserve markdown header ABOVE the 
apache license else the .md files won't render as markdown 

  was:
$ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l 
--sortorder=newer --skip-credits

$ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.1 -l 
--sortorder=newer --skip-credits

... using yetus tagged 0.8.0

...then carefully stitched the product into the current CHANGES.md and 
RELEASENOTES.md files being careful to preserve markdown header ABOVE the 
apache license else the .md files won't render as markdown 


> Release notes and changes for 2.0.4RC0 and 2.1.2RC0
> ---
>
> Key: HBASE-21566
> URL: https://issues.apache.org/jira/browse/HBASE-21566
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.4
>
>
> $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l 
> --sortorder=newer --skip-credits
> $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.2 -l 
> --sortorder=newer --skip-credits
> ... using yetus tagged 0.8.0
> ...then carefully stitched the product into the current CHANGES.md and 
> RELEASENOTES.md files being careful to preserve markdown header ABOVE the 
> apache license else the .md files won't render as markdown 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.1RC0

2018-12-06 Thread stack (JIRA)

stack created HBASE-21566:
-

 Summary: Release notes and changes for 2.0.4RC0 and 2.1.1RC0
 Key: HBASE-21566
 URL: https://issues.apache.org/jira/browse/HBASE-21566
 Project: HBase
  Issue Type: Sub-task
  Components: release
Reporter: stack
Assignee: stack
 Fix For: 2.0.4


$ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l 
--sortorder=newer --skip-credits

$ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.1 -l 
--sortorder=newer --skip-credits

... using yetus tagged 0.8.0

...then carefully stitched the product into the current CHANGES.md and 
RELEASENOTES.md files being careful to preserve markdown header ABOVE the 
apache license else the .md files won't render as markdown 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-06 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21554:
---
Attachment: HBASE-21554.branch-2.001.patch

> Show replication endpoint classname for replication peer on master web UI
> -
>
> Key: HBASE-21554
> URL: https://issues.apache.org/jira/browse/HBASE-21554
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21554.branch-2.001.patch, 
> HBASE-21554.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712358#comment-16712358
 ] 

Zheng Hu commented on HBASE-21554:
--

+1

> Show replication endpoint classname for replication peer on master web UI
> -
>
> Key: HBASE-21554
> URL: https://issues.apache.org/jira/browse/HBASE-21554
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21554.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21413:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+. Andrew made an issue for backport to branch-1. Pushing 
so can work on an RC. Thanks [~allan163]

> Empty meta log doesn't get split when restart whole cluster
> ---
>
> Key: HBASE-21413
> URL: https://issues.apache.org/jira/browse/HBASE-21413
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Jingyun Tian
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21413.branch-2.1.001.patch, 
> HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, 
> Screenshot from 2018-10-31 18-11-11.png
>
>
> After I restart whole cluster, there is a splitting directory still exists on 
> hdfs. Then I found there is only an empty meta wal file in it. I'll dig into 
> this later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21413:
--
Fix Version/s: 2.2.0

> Empty meta log doesn't get split when restart whole cluster
> ---
>
> Key: HBASE-21413
> URL: https://issues.apache.org/jira/browse/HBASE-21413
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Jingyun Tian
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21413.branch-2.1.001.patch, 
> HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, 
> Screenshot from 2018-10-31 18-11-11.png
>
>
> After I restart whole cluster, there is a splitting directory still exists on 
> hdfs. Then I found there is only an empty meta wal file in it. I'll dig into 
> this later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21559:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Resolving so can put up an RC. Can open new issue if still a prob.

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712338#comment-16712338
 ] 

stack commented on HBASE-21559:
---

Nice fix [~openinx]. Running it locally, I can't make the hang anymore. Thanks.

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException

2018-12-06 Thread ramkrishna.s.vasudevan (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712337#comment-16712337
 ] 

ramkrishna.s.vasudevan commented on HBASE-21563:


I think similar issues we have seen earlier too with this encoders? I don 
remember if they got fixed. 

> HBase Get Encounters java.lang.IndexOutOfBoundsException
> 
>
> Key: HBASE-21563
> URL: https://issues.apache.org/jira/browse/HBASE-21563
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 1.2.0
>Reporter: William Shen
>Priority: Major
> Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz
>
>
> We've recently encountered issue retrieving data from our HBase cluster, and 
> have not had much luck troubleshooting the issue. We narrowed down our issue 
> to a single GET, which appears to be caused by FastDiffDeltaEncoder.java 
> running into java.lang.IndexOutOfBoundsException. 
> Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? 
> We are running 1.2.0-cdh5.9.2, and the GET in question is:
> {noformat}
> hbase(main):004:0> get 'qa2.ADGROUPS', 
> "\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE"
> COLUMNCELL
>   
>
> ERROR: java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.IndexOutOfBoundsException
> at java.nio.Buffer.checkBounds(Buffer.java:567)
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149)
> at 
> org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465)
> at 
> org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516)
> at 
> org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
> ... 3 more {noformat}
> Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of 
> kv pairs were printed until the program hits the following exception and 
> crashes:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Unknown code 65
> at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259)
> at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246)
> at 
> org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506)
> at java.lang.String.valueOf(String.java:2994)
> at java.lang.StringBuilder.append(StringBuilder.java:131)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePre

[jira] [Resolved] (HBASE-21562) TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are flakey

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21562.
---
Resolution: Duplicate

Resolve. Duplicate of HBASE-21559. Thanks [~Apache9]


> TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are 
> flakey
> ---
>
> Key: HBASE-21562
> URL: https://issues.apache.org/jira/browse/HBASE-21562
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Priority: Major
> Fix For: 2.1.2
>
>
> Fails 60% of the time on GCE runs. Messes up our nightlies for branch-2.1 and 
> branch-2.0 at least.
> Looking its a bit tough figuring what is going on. Test asks us split 
> regions. The split starts then hangs. Last thing reported is:
>  2018-12-06 10:20:30,823 INFO  [PEWorker-16] 
> procedure.MasterProcedureScheduler(741): Took xlock for pid=174, 
> state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
> table=testRestoreSnapshotAfterSplittingRegions_1__regionReplication_3_-1544120421990,
> parent=034bb3ebb3f9a7442f927caacdda5354, 
> daughterA=fbe392ca659b3913181d05ac4fb19b4c, 
> daughterB=3646ac333722af33c32e6f3428d23f95
> ... then all we get is that the worker is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21559:
--
Fix Version/s: (was: 2.0.5)
   2.2.0

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21549) Add shell command for serial replication peer

2018-12-06 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21549:
---
Attachment: HBASE-21549.branch-2.001.patch

> Add shell command for serial replication peer
> -
>
> Key: HBASE-21549
> URL: https://issues.apache.org/jira/browse/HBASE-21549
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21549.branch-2.001.patch, 
> HBASE-21549.master.001.patch, HBASE-21549.master.002.patch, 
> HBASE-21549.master.003.patch
>
>
> add_peer support add a serial replication peer directly.
> set_peer_serial support change a replication peer's serial flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712323#comment-16712323
 ] 

Hudson commented on HBASE-21559:


Results for branch branch-2.1
[build #664 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712307#comment-16712307
 ] 

Duo Zhang commented on HBASE-21564:
---

Thanks for the nice finding. Can apply the fix for now. And I think we should 
redesign the log rolller, at least we should use different thread for different 
WALs...

> race condition in WAL rolling resulting in size-based rolling getting stuck
> ---
>
> Key: HBASE-21564
> URL: https://issues.apache.org/jira/browse/HBASE-21564
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-21564.master.001.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-06 Thread Jingyun Tian (JIRA)

Jingyun Tian created HBASE-21565:


 Summary: Delete dead server from dead server list too early leads 
to concurrent Server Crash Procedures(SCP) for a same server
 Key: HBASE-21565
 URL: https://issues.apache.org/jira/browse/HBASE-21565
 Project: HBase
  Issue Type: Bug
Reporter: Jingyun Tian
Assignee: Jingyun Tian


There are 2 kinds of SCP for a same server will be scheduled during cluster 
restart, one is ZK session timeout, the other one is new server report in will 
cause the stale one do fail over. The only barrier for these 2 kinds of SCP is 
check if the server is in the dead server list.
{code}
if (this.deadservers.isDeadServer(serverName)) {
  LOG.warn("Expiration called on {} but crash processing already in 
progress", serverName);
  return false;
}
{code}
But the problem is when master finish initialization, it will delete all stale 
servers from dead server list. Thus when the SCP for ZK session timeout come 
in, the barrier is already removed.
Here is the logs that how this problem occur.
{code}
2018-12-07,11:42:37,589 INFO 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
2018-12-07,11:42:58,007 INFO 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
{code}
Now we can see two SCP are scheduled for the same server.
But the first procedure is finished after the second SCP starts.
{code}
2018-12-07,11:43:08,038 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
state=SUCCESS, hasLock=false; ServerCrashProcedure 
server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false in 
30.5340sec
{code}
Thus it will leads the problem that regions will be assigned twice.
{code}
2018-12-07,12:16:33,039 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
{code}
And here we can see the server is removed from dead server list before the 
second SCP starts.
{code}
2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
{code}

Thus we should not delete dead server from dead server list immediately.
Patch to fix this problem will be upload later.


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712311#comment-16712311
 ] 

Hudson commented on HBASE-21559:


Results for branch branch-2.0
[build #1143 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

2018-12-06 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712302#comment-16712302
 ] 

stack commented on HBASE-21564:
---

bq.  reaching target size causes all WALs to roll...

all WALs ... you mean the user-space WAL and meta WAL? If so, that sounds 
wrong. Unintentional.

This looks like a gnarly bug. Good find. You can repro it [~sershe]?

Thanks.

> race condition in WAL rolling resulting in size-based rolling getting stuck
> ---
>
> Key: HBASE-21564
> URL: https://issues.apache.org/jira/browse/HBASE-21564
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-21564.master.001.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

2018-12-06 Thread Sergey Shelukhin (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712290#comment-16712290
 ] 

Sergey Shelukhin commented on HBASE-21564:
--

[~stack] do you remember why on WAL reaching target size causes all WALs to 
roll (in normal non-multi-wal case, only meta wal will be affected)? See 
LogRoller walNeedsToRoll map before this patch - in normal case, the value is 
set to true for a particular WAL when requesting a WAL roll based on size, but 
when actually rolling WALs in run() it's not used as a filter but merely as a 
value for "force" flag and all WALs are rolled. It seems like a random thing to 
do, esp. if using multi-wal.

> race condition in WAL rolling resulting in size-based rolling getting stuck
> ---
>
> Key: HBASE-21564
> URL: https://issues.apache.org/jira/browse/HBASE-21564
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-21564.master.001.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

2018-12-06 Thread Sergey Shelukhin (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712283#comment-16712283
 ] 

Sergey Shelukhin commented on HBASE-21564:
--

This patch should fix the issue (I'll test on an internal repro); it also 
changes log roll request to rely on future-s instead of looping (since with 
roll requests arriving frequently, the way the wait... is implemented it may 
never return because some log will always be rolling based on size)

> race condition in WAL rolling resulting in size-based rolling getting stuck
> ---
>
> Key: HBASE-21564
> URL: https://issues.apache.org/jira/browse/HBASE-21564
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-21564.master.001.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21564) race condition in WAL rolling

2018-12-06 Thread Sergey Shelukhin (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21564:
-
Attachment: HBASE-21564.master.001.patch

> race condition in WAL rolling
> -
>
> Key: HBASE-21564
> URL: https://issues.apache.org/jira/browse/HBASE-21564
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-21564.master.001.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck

2018-12-06 Thread Sergey Shelukhin (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21564:
-
Summary: race condition in WAL rolling resulting in size-based rolling 
getting stuck  (was: race condition in WAL rolling)

> race condition in WAL rolling resulting in size-based rolling getting stuck
> ---
>
> Key: HBASE-21564
> URL: https://issues.apache.org/jira/browse/HBASE-21564
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-21564.master.001.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21564) race condition in WAL rolling

2018-12-06 Thread Sergey Shelukhin (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21564:
-
Status: Patch Available  (was: Open)

> race condition in WAL rolling
> -
>
> Key: HBASE-21564
> URL: https://issues.apache.org/jira/browse/HBASE-21564
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HBASE-21564.master.001.patch
>
>
> Manifests at least with AsyncFsWriter.
> There's a window after LogRoller replaces the writer in the WAL, but before 
> it sets the rollLog boolean to false in the finally, where the WAL class can 
> request another log roll (it can happen in particular when the logs are 
> getting archived in the LogRoller thread, and there's high write volume 
> causing the logs to roll quickly).
> LogRoller will blindly reset the rollLog flag in finally and "forget" about 
> this request.
> AsyncWAL in turn never requests it again because its own rollRequested field 
> is set and it expects a callback. Logs don't get rolled until a periodic roll 
> is triggered after that.
> The acknowledgment of roll requests by LogRoller should be atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21564) race condition in WAL rolling

2018-12-06 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HBASE-21564:


 Summary: race condition in WAL rolling
 Key: HBASE-21564
 URL: https://issues.apache.org/jira/browse/HBASE-21564
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Manifests at least with AsyncFsWriter.
There's a window after LogRoller replaces the writer in the WAL, but before it 
sets the rollLog boolean to false in the finally, where the WAL class can 
request another log roll (it can happen in particular when the logs are getting 
archived in the LogRoller thread, and there's high write volume causing the 
logs to roll quickly).
LogRoller will blindly reset the rollLog flag in finally and "forget" about 
this request.
AsyncWAL in turn never requests it again because its own rollRequested field is 
set and it expects a callback. Logs don't get rolled until a periodic roll is 
triggered after that.

The acknowledgment of roll requests by LogRoller should be atomic.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21514) Refactor CacheConfig

2018-12-06 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21514:
---
Description: 
# move the global cache instances from CacheConfig to BlockCacheFactory. Only 
keep config stuff in CacheConfig.
 # Move block cache to HRegionServer's member variable. One rs has one block 
cache.

  was:
# move the global cache instances from CacheConfig to BlockCacheFactory. Only 
keep config stuff in CacheConfig.
 # Move block cache to HRegionServer's member variable. One rs has one block 
cache.
 # Still keep GLOBAL_BLOCK_CACHE_INSTANCE in BlockCacheFactory. As there are 
some unit tests which don't start a mini cluster. But want to use block cache, 
too.


> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, 
> HBASE-21514.master.008.patch, HBASE-21514.master.009.patch
>
>
> # move the global cache instances from CacheConfig to BlockCacheFactory. Only 
> keep config stuff in CacheConfig.
>  # Move block cache to HRegionServer's member variable. One rs has one block 
> cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-06 Thread Guanghao Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712259#comment-16712259
 ] 

Guanghao Zhang commented on HBASE-21514:


The javac warning not introduced by this patch.

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, 
> HBASE-21514.master.008.patch, HBASE-21514.master.009.patch
>
>
> # move the global cache instances from CacheConfig to BlockCacheFactory. Only 
> keep config stuff in CacheConfig.
>  # Move block cache to HRegionServer's member variable. One rs has one block 
> cache.
>  # Still keep GLOBAL_BLOCK_CACHE_INSTANCE in BlockCacheFactory. As there are 
> some unit tests which don't start a mini cluster. But want to use block 
> cache, too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster

2018-12-06 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712258#comment-16712258
 ] 

Allan Yang commented on HBASE-21413:


[~apurtell], sure, I can do the backport

> Empty meta log doesn't get split when restart whole cluster
> ---
>
> Key: HBASE-21413
> URL: https://issues.apache.org/jira/browse/HBASE-21413
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Jingyun Tian
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21413.branch-2.1.001.patch, 
> HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, 
> Screenshot from 2018-10-31 18-11-11.png
>
>
> After I restart whole cluster, there is a splitting directory still exists on 
> hdfs. Then I found there is only an empty meta wal file in it. I'll dig into 
> this later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712243#comment-16712243
 ] 

Zheng Hu commented on HBASE-21551:
--

Thanks [~busbey] for the release note. 

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712241#comment-16712241
 ] 

Zheng Hu commented on HBASE-21563:
--

Let me take a look. 

> HBase Get Encounters java.lang.IndexOutOfBoundsException
> 
>
> Key: HBASE-21563
> URL: https://issues.apache.org/jira/browse/HBASE-21563
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 1.2.0
>Reporter: William Shen
>Priority: Major
> Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz
>
>
> We've recently encountered issue retrieving data from our HBase cluster, and 
> have not had much luck troubleshooting the issue. We narrowed down our issue 
> to a single GET, which appears to be caused by FastDiffDeltaEncoder.java 
> running into java.lang.IndexOutOfBoundsException. 
> Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? 
> We are running 1.2.0-cdh5.9.2, and the GET in question is:
> {noformat}
> hbase(main):004:0> get 'qa2.ADGROUPS', 
> "\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE"
> COLUMNCELL
>   
>
> ERROR: java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.IndexOutOfBoundsException
> at java.nio.Buffer.checkBounds(Buffer.java:567)
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149)
> at 
> org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465)
> at 
> org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516)
> at 
> org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
> ... 3 more {noformat}
> Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of 
> kv pairs were printed until the program hits the following exception and 
> crashes:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Unknown code 65
> at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259)
> at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246)
> at 
> org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506)
> at java.lang.String.valueOf(String.java:2994)
> at java.lang.StringBuilder.append(StringBuilder.java:131)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:677)
> {noformat}
> I have attached the HFile related to this issue for

[jira] [Resolved] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Sean Busbey (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-21551.
-
  Resolution: Fixed
Release Note: 

### Summary
HBase clusters will experience Region Server failures due to out of memory 
errors due to a leak given any of the following:

* User initiates Scan operations set to use the STREAM reading type
* User initiates Scan operations set to use the default reading type that read 
more than 4 * the block size of column families involved in the scan (e.g. by 
default 4*64KiB)
* Compactions run

### Root cause

When there are long running scans the Region Server process attempts to 
optimize access by using a different API geared towards sequential access. Due 
to an error in HBASE-20704 for HBase 2.0+ the Region Server fails to release 
related resources when those scans finish. That same optimization path is 
always used for the HBase internal file compaction process.

### Workaround

Impact for this error can be minimized by setting the config value 
“hbase.storescanner.pread.max.bytes” to MAX_INT to avoid the optimization for 
default user scans. Clients should also be checked to ensure they do not pass 
the STREAM read type to the Scan API. This will have a severe impact on 
performance for long scans.

Compactions always use this sequential optimized reading mechanism so 
downstream users will need to periodically restart Region Server roles after 
compactions have happened.

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Sean Busbey (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reopened HBASE-21551:
-

reopening so I can add a release note

> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712174#comment-16712174
 ] 

Duo Zhang commented on HBASE-21559:
---

Pushed to branch-2.0+. Let's see how it works.

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21562) TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are flakey

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712166#comment-16712166
 ] 

Duo Zhang commented on HBASE-21562:
---

HBASE-21559?

> TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are 
> flakey
> ---
>
> Key: HBASE-21562
> URL: https://issues.apache.org/jira/browse/HBASE-21562
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Priority: Major
> Fix For: 2.1.2
>
>
> Fails 60% of the time on GCE runs. Messes up our nightlies for branch-2.1 and 
> branch-2.0 at least.
> Looking its a bit tough figuring what is going on. Test asks us split 
> regions. The split starts then hangs. Last thing reported is:
>  2018-12-06 10:20:30,823 INFO  [PEWorker-16] 
> procedure.MasterProcedureScheduler(741): Took xlock for pid=174, 
> state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
> table=testRestoreSnapshotAfterSplittingRegions_1__regionReplication_3_-1544120421990,
> parent=034bb3ebb3f9a7442f927caacdda5354, 
> daughterA=fbe392ca659b3913181d05ac4fb19b4c, 
> daughterB=3646ac333722af33c32e6f3428d23f95
> ... then all we get is that the worker is stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21414) StoreFileSize growth rate metric

2018-12-06 Thread Sergey Shelukhin (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21414:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the patch!

> StoreFileSize growth rate metric
> 
>
> Key: HBASE-21414
> URL: https://issues.apache.org/jira/browse/HBASE-21414
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, monitoring
>Reporter: Tommy Li
>Assignee: Tommy Li
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-21414.master.001.patch, 
> HBASE-21414.master.002.patch, HBASE-21414.master.003.patch
>
>
> A metric on the growth rate of storefile sizes would be nice to have as a way 
> of monitoring traffic patterns. I know you can get the same insight from 
> graphing the delta on the storeFileSize metric, but not all metrics 
> visualization tools support that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21414) StoreFileSize growth rate metric

2018-12-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712047#comment-16712047
 ] 

Hadoop QA commented on HBASE-21414:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
47s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} hbase-hadoop2-compat in master has 18 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} hbase-server: The patch generated 0 new + 3 
unchanged - 2 fixed = 3 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 10s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}122m 
14s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21414 |
| JIRA Patch URL | 
https://issues.apache.org/jira

[jira] [Commented] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException

2018-12-06 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712031#comment-16712031
 ] 

stack commented on HBASE-21563:
---

[~openinx] This like your HBASE-21379? In a different location? Different 
encoding?

> HBase Get Encounters java.lang.IndexOutOfBoundsException
> 
>
> Key: HBASE-21563
> URL: https://issues.apache.org/jira/browse/HBASE-21563
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 1.2.0
>Reporter: William Shen
>Priority: Major
> Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz
>
>
> We've recently encountered issue retrieving data from our HBase cluster, and 
> have not had much luck troubleshooting the issue. We narrowed down our issue 
> to a single GET, which appears to be caused by FastDiffDeltaEncoder.java 
> running into java.lang.IndexOutOfBoundsException. 
> Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? 
> We are running 1.2.0-cdh5.9.2, and the GET in question is:
> {noformat}
> hbase(main):004:0> get 'qa2.ADGROUPS', 
> "\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE"
> COLUMNCELL
>   
>
> ERROR: java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.IndexOutOfBoundsException
> at java.nio.Buffer.checkBounds(Buffer.java:567)
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149)
> at 
> org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465)
> at 
> org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516)
> at 
> org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588)
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
> ... 3 more {noformat}
> Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of 
> kv pairs were printed until the program hits the following exception and 
> crashes:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Unknown code 65
> at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259)
> at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246)
> at 
> org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506)
> at java.lang.String.valueOf(String.java:2994)
> at java.lang.StringBuilder.append(StringBuilder.java:131)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at 
> org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:677)
> {nofor

[jira] [Created] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException

2018-12-06 Thread William Shen (JIRA)

William Shen created HBASE-21563:


 Summary: HBase Get Encounters java.lang.IndexOutOfBoundsException
 Key: HBASE-21563
 URL: https://issues.apache.org/jira/browse/HBASE-21563
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 1.2.0
Reporter: William Shen
 Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz

We've recently encountered issue retrieving data from our HBase cluster, and 
have not had much luck troubleshooting the issue. We narrowed down our issue to 
a single GET, which appears to be caused by FastDiffDeltaEncoder.java running 
into java.lang.IndexOutOfBoundsException. 

Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? 

We are running 1.2.0-cdh5.9.2, and the GET in question is:

{noformat}
hbase(main):004:0> get 'qa2.ADGROUPS', 
"\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE"

COLUMNCELL  

   



ERROR: java.io.IOException

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)

Caused by: java.lang.IndexOutOfBoundsException

at java.nio.Buffer.checkBounds(Buffer.java:567)

at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149)

at 
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465)

at 
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516)

at 
org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618)

at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277)

at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180)

at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108)

at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588)

at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)

at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706)

at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865)

at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643)

at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620)

at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606)

at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801)

at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779)

at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029)

at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)

... 3 more {noformat}

Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of kv 
pairs were printed until the program hits the following exception and crashes:

{noformat}
Exception in thread "main" java.lang.RuntimeException: Unknown code 65

at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259)

at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246)

at 
org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506)

at java.lang.String.valueOf(String.java:2994)

at java.lang.StringBuilder.append(StringBuilder.java:131)

at 
org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382)

at 
org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316)

at 
org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at 
org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:677)
{noformat}

I have attached the HFile related to this issue for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711921#comment-16711921
 ] 

Hadoop QA commented on HBASE-21559:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
44s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}225m 
27s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}271m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21559 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950848/HBASE-21559.v2.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux daa900739133 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 12e75a8a63 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15211/testReport/ |
| Max. process+thread count | 5021 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15

[jira] [Created] (HBASE-21562) TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are flakey

2018-12-06 Thread stack (JIRA)

stack created HBASE-21562:
-

 Summary: TestRestoreSnapshotFromClientAfterSplittingRegions and 
related tests are flakey
 Key: HBASE-21562
 URL: https://issues.apache.org/jira/browse/HBASE-21562
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: stack
 Fix For: 2.1.2


Fails 60% of the time on GCE runs. Messes up our nightlies for branch-2.1 and 
branch-2.0 at least.

Looking its a bit tough figuring what is going on. Test asks us split regions. 
The split starts then hangs. Last thing reported is:

 2018-12-06 10:20:30,823 INFO  [PEWorker-16] 
procedure.MasterProcedureScheduler(741): Took xlock for pid=174, 
state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure 
table=testRestoreSnapshotAfterSplittingRegions_1__regionReplication_3_-1544120421990,
parent=034bb3ebb3f9a7442f927caacdda5354, 
daughterA=fbe392ca659b3913181d05ac4fb19b4c, 
daughterB=3646ac333722af33c32e6f3428d23f95

... then all we get is that the worker is stuck.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21453) Convert ReadOnlyZKClient to DEBUG instead of INFO

2018-12-06 Thread Peter Somogyi (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711889#comment-16711889
 ] 

Peter Somogyi commented on HBASE-21453:
---

You're right that this Jira is really focused on ReadOnlyZKClient class. Let's 
have this as it is now and have a follow up issue for other zookeeper related 
log messages.

+1 on this patch

> Convert ReadOnlyZKClient to DEBUG instead of INFO
> -
>
> Key: HBASE-21453
> URL: https://issues.apache.org/jira/browse/HBASE-21453
> Project: HBase
>  Issue Type: Bug
>  Components: logging, Zookeeper
>Reporter: stack
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21453.master.001.patch
>
>
> Running commands in spark-shell, this is what it looks like on each 
> invocation:
> {code}
> scala> val count = rdd.count()
> 2018-11-07 21:01:46,026 INFO  [Executor task launch worker for task 1] 
> zookeeper.ReadOnlyZKClient: Connect 0x18f3d868 to localhost:2181 with session 
> timeout=9ms, retries 30, retry interval 1000ms, keepAlive=6ms
> 2018-11-07 21:01:46,027 INFO  [ReadOnlyZKClient-localhost:2181@0x18f3d868] 
> zookeeper.ZooKeeper: Initiating client connection, 
> connectString=localhost:2181 sessionTimeout=9 
> watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$20/1362339879@743dab9f
> 2018-11-07 21:01:46,030 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> 2018-11-07 21:01:46,031 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Socket connection established to 
> localhost/127.0.0.1:2181, initiating session
> 2018-11-07 21:01:46,033 INFO  
> [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] 
> zookeeper.ClientCnxn: Session establishment complete on server 
> localhost/127.0.0.1:2181, sessionid = 0x166f1b283080005, negotiated timeout = 
> 4
> 2018-11-07 21:01:46,035 INFO  [Executor task launch worker for task 1] 
> mapreduce.TableInputFormatBase: Input split length: 0 bytes.
> [Stage 1:>  (0 + 1) / 
> 1]2018-11-07 21:01:48,074 INFO  [Executor task launch worker for task 1] 
> zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x18f3d868 to 
> localhost:2181
> 2018-11-07 21:01:48,075 INFO  [ReadOnlyZKClient-localhost:2181@0x18f3d868] 
> zookeeper.ZooKeeper: Session: 0x166f1b283080005 closed
> 2018-11-07 21:01:48,076 INFO  [ReadOnlyZKClient 
> -localhost:2181@0x18f3d868-EventThread] zookeeper.ClientCnxn: EventThread 
> shut down for session: 0x166f1b283080005
> count: Long = 10
> {code}
> Let me shut down the ReadOnlyZKClient log level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-06 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711870#comment-16711870
 ] 

stack commented on HBASE-15560:
---

Scheduled this on 1.5 too (smile).

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache

2018-12-06 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-15560:
--
Fix Version/s: 1.5.0

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21414) StoreFileSize growth rate metric

2018-12-06 Thread Tommy Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Li updated HBASE-21414:
-
Attachment: HBASE-21414.master.003.patch

> StoreFileSize growth rate metric
> 
>
> Key: HBASE-21414
> URL: https://issues.apache.org/jira/browse/HBASE-21414
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, monitoring
>Reporter: Tommy Li
>Assignee: Tommy Li
>Priority: Minor
> Attachments: HBASE-21414.master.001.patch, 
> HBASE-21414.master.002.patch, HBASE-21414.master.003.patch
>
>
> A metric on the growth rate of storefile sizes would be nice to have as a way 
> of monitoring traffic patterns. I know you can get the same insight from 
> graphing the delta on the storeFileSize metric, but not all metrics 
> visualization tools support that



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-06 Thread Andrew Purtell (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711856#comment-16711856
 ] 

Andrew Purtell commented on HBASE-15560:


We could make it default in 3.0 for sure, possibly in 2.2 with a big fat 
release note? Because it is configurable an operator could switch away if they 
notice a problem after an upgrade. Although I think we might have a debate on 
compatibility semantics if done in a minor.

I am winding down some internal stuff at work and will have more time to work 
on open source very soon, with the intent to branch for 1.5 and make a series 
of 1.5 releases. For what it's worth we could try tiny-LFU as default in 1.5 
should a branch-1 patch be made available and committed prior to starting that. 
Expecting to start the 1.5 stuff next month, January 2019. Part of the release 
work for a new minor would be a lot more perf testing than usual, although with 
the usual set of crappy tools (PE, YCSB, etc.)

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster

2018-12-06 Thread Andrew Purtell (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711815#comment-16711815
 ] 

Andrew Purtell commented on HBASE-21413:


This is an issue for branch-1 too but the patch as is can't be applied, it 
includes Java 8 constructs in the test. I will open a subtask for backport. No 
need to take it up if you don't want [~allan163], although a backport would 
certainly be appreciated.

> Empty meta log doesn't get split when restart whole cluster
> ---
>
> Key: HBASE-21413
> URL: https://issues.apache.org/jira/browse/HBASE-21413
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Jingyun Tian
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21413.branch-2.1.001.patch, 
> HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, 
> Screenshot from 2018-10-31 18-11-11.png
>
>
> After I restart whole cluster, there is a splitting directory still exists on 
> hdfs. Then I found there is only an empty meta wal file in it. I'll dig into 
> this later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-06 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711823#comment-16711823
 ] 

stack commented on HBASE-15560:
---

Thanks [~apurtell]. Was just hoping it was better in most cases so we would 
just enable it as default. Was trying to avoid adding code and options that 
might go unexercised. Lets see if we get an uptake on our call for a volunteer? 
If nought, can commit (I've scheduled this against 2.2/3.0 so it will at least 
get consideration before we make those releases).

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21561) Backport HBASE-21413 (Empty meta log doesn't get split when restart whole cluster) to branch-1

2018-12-06 Thread Andrew Purtell (JIRA)

Andrew Purtell created HBASE-21561:
--

 Summary: Backport HBASE-21413 (Empty meta log doesn't get split 
when restart whole cluster) to branch-1
 Key: HBASE-21561
 URL: https://issues.apache.org/jira/browse/HBASE-21561
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
 Fix For: 1.5.0, 1.3.3, 1.4.10






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-06 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-21553:
--

Assignee: Xu Cang

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache

2018-12-06 Thread Andrew Purtell (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711804#comment-16711804
 ] 

Andrew Purtell commented on HBASE-15560:


I came to say the feature is additive (modulo changes to blockcache to enable 
the tiny-LFU policy to be an optional feature) and optional, so why not put it 
in and allow people to try it out at their option. However then I see above 
[~stack] wants it to be default out of a well-intentioned goal to hold down 
further growth of the state space of our optional configurations.

Unfortunately the reason for the growth over time of our suite of configuration 
options is the IMHO unresolvable tension between the desire to ship new and 
beneficial features to the user community and the desire of others to acquire 
bug fixes from upgrades without taking on default-on changes that might 
destabilize current operations. There is no way to resolve this tension so over 
time the suite of optional configurations for a mature product grows. I think 
that is fine. So why not commit this and let people try it out at their option?

> TinyLFU-based BlockCache
> 
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Ben Manes
>Assignee: Ben Manes
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, 
> run_ycsb_loading.sh, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-06 Thread Xu Cang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711805#comment-16711805
 ] 

Xu Cang commented on HBASE-21553:
-

Yes. Will upload a patch today.
[~apurtell]

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711798#comment-16711798
 ] 

Hadoop QA commented on HBASE-21505:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
44s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} hbase-hadoop2-compat in master has 18 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
30s{color} | {color:red} hbase-server: The patch generated 3 new + 85 unchanged 
- 3 fixed = 88 total (was 88) {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m  
9s{color} | {color:red} The patch generated 55 new + 405 unchanged - 9 fixed = 
460 total (was 414) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m  4s{color} | {color:orange} The patch generated 3 new + 748 unchanged - 1 
fixed = 751 total (was 749) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
56s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
3m  0s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
43s{color} | {color:red} hbase-server generated 2 new + 0 unchanged - 0 fixed = 
2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-protocol

[jira] [Commented] (HBASE-21553) schedLock not released in MasterProcedureScheduler

2018-12-06 Thread Andrew Purtell (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711781#comment-16711781
 ] 

Andrew Purtell commented on HBASE-21553:


Are you planning to provide a patch [~xucang]?

> schedLock not released in MasterProcedureScheduler
> --
>
> Key: HBASE-21553
> URL: https://issues.apache.org/jira/browse/HBASE-21553
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xu Cang
>Priority: Major
>
> https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749
> As shown above, we didn't unlock schedLock which can cause deadlock.
> Besides this, there are other places in this class handles schedLock.unlock 
> in a risky manner. I'd like to move them to finally block to improve the 
> robustness of handling locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21283) Add new shell command 'rit' for listing regions in transition

2018-12-06 Thread Sean Busbey (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21283:

Component/s: Operability

> Add new shell command 'rit' for listing regions in transition
> -
>
> Key: HBASE-21283
> URL: https://issues.apache.org/jira/browse/HBASE-21283
> Project: HBase
>  Issue Type: Improvement
>  Components: Operability, shell
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21283-branch-1.patch, HBASE-21283-branch-1.patch, 
> HBASE-21283-branch-1.patch, HBASE-21283.patch, HBASE-21283.patch, 
> HBASE-21283.patch
>
>
> The 'status' shell command shows regions in transition but sometimes an 
> operator may want to retrieve a simple list of regions in transition. Here's 
> a patch that adds a new 'rit' command to the TOOLS group that does just that. 
> No test, because it seems hard to mock RITs from the ruby test code, but I 
> have run TestShell and it passes, so the command is verified to meet minimum 
> requirements, like help text, and manually verified with branch-1 (shell in 
> branch-2 and up doesn't return until TransitRegionProcedure has completed so 
> by that time no RIT):
> {noformat}
> HBase Shell
> Use "help" to get list of supported commands.
> Use "exit" to quit this interactive shell.
> Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct  8 
> 21:05:50 UTC 2018
> hbase(main):001:0> help 'rit'
> List all regions in transition.
> Examples:
>   hbase> rit
> hbase(main):002:0> create ...
> 0 row(s) in 2.5150 seconds
> => Hbase::Table - IntegrationTestBigLinkedList
> hbase(main):003:0> rit
> 0 row(s) in 0.0340 seconds
> hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1'
> 0 row(s) in 0.0540 seconds
> hbase(main):005:0> rit 
> IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1.
>  state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null   
>   
>   
> 
> 1 row(s) in 0.0170 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-06 Thread Wellington Chevreuil (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21505:
-
Status: Patch Available  (was: In Progress)

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, 
> HBASE-21505-master.001.patch, HBASE-21505-master.002.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format 
> gives the impression there’s only one, even when there are recovery queues, 
> for instance. 
> Here is a list of ideas on how the command should report under different 
> states of replication:
> a) Source started, target stopped, no ed

[jira] [Commented] (HBASE-21526) Use AsyncClusterConnection in ServerManager for getRsAdmin

2018-12-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711660#comment-16711660
 ] 

Hadoop QA commented on HBASE-21526:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} HBASE-21512 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 9s{color} | {color:green} HBASE-21512 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
53s{color} | {color:green} HBASE-21512 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 0s{color} | {color:green} HBASE-21512 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
31s{color} | {color:green} HBASE-21512 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} HBASE-21512 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} The patch passed checkstyle in hbase-common {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} The patch passed checkstyle in hbase-client {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} hbase-server: The patch generated 0 new + 167 
unchanged - 4 fixed = 167 total (was 171) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
47s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
42s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}125m  
8s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yet

[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-06 Thread Wellington Chevreuil (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21505:
-
Status: In Progress  (was: Patch Available)

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, 
> HBASE-21505-master.001.patch, HBASE-21505-master.002.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format 
> gives the impression there’s only one, even when there are recovery queues, 
> for instance. 
> Here is a list of ideas on how the command should report under different 
> states of replication:
> a) Source started, target stopped, no ed

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-12-06 Thread Pankaj Kumar (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711632#comment-16711632
 ] 

Pankaj Kumar commented on HBASE-21217:
--

Got it [~allan163]... thanks for the Jira pointer.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711604#comment-16711604
 ] 

Hadoop QA commented on HBASE-21559:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
13s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 7s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 27s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}131m 
44s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21559 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950827/HBASE-21559.v1.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 60403a0126a1 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 12e75a8a63 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15208/testReport/ |
| Max. process+thread count | 4432 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE

[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711569#comment-16711569
 ] 

Hudson commented on HBASE-21512:


Results for branch HBASE-21512
[build #8 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
> --
>
> Key: HBASE-21512
> URL: https://issues.apache.org/jira/browse/HBASE-21512
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
>
> At least for the RSProcedureDispatcher, with CompletableFuture we do not need 
> to set a delay and use a thread pool any more, which could reduce the 
> resource usage and also the latency.
> Once this is done, I think we can remove the ClusterConnection completely, 
> and start to rewrite the old sync client based on the async client, which 
> could reduce the code base a lot for our client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711538#comment-16711538
 ] 

Zheng Hu commented on HBASE-21559:
--

bq. But at t8, the TakeSnapshotHandler is already in the map right?  
Think about the above case again,  should be no problem if move the v!=null & 
v.isFinished() out  of the computeIfPresent. because the status of v will 
transform from not finished to finished.   if get a not finished  state,  then 
the STRP won't process, it's OK even if someone change it from not finished to 
finished. 

bq. so the problem here is that the state should be volatile.
the state is volatile now. 


> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21559:
-
Attachment: HBASE-21559.v2.patch

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711506#comment-16711506
 ] 

Duo Zhang commented on HBASE-21559:
---

But at t8, the TakeSnapshotHandler is already in the map right? We will not 
hold the map lock when changing its state, so the problem here is that the 
state should be volatile.

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711503#comment-16711503
 ] 

Zheng Hu edited comment on HBASE-21559 at 12/6/18 2:25 PM:
---

bq. But they will not hold the map lock when modifying right? 
Assume the case: 
t1.   start snapshot 
t2.   hold the table x-lock
t3.   release the table x-lock;
t4.   downgrade to slock because table is enabled;
t5.   start snapshot on RS...
t6.   SplitTableRegionProcedure (STRP)  submitted . 
t7.   STRP hold the table s-lock
t8.   STRP check isTakingSnapshot
.

Then at t8,  the SnapshotManager may update the status of handler at any time , 
I think. 




was (Author: openinx):
bq. But they will not hold the map lock when modifying right? 
Assume the case: 
t1.   start snapshot 
t2.   hold the table x-lock
t3.   rease the table x-lock;
t4.   downgrade to slock because table is enabled;
t5.   start snapshot on RS...
t6.   SplitTableRegionProcedure start . 
t7.   STRP hold the table s-lock
t8.   check isTakingSnapshot
.

Then at t8,  the SnapshotManager may update the status of handler at any time , 
I think. 



> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711503#comment-16711503
 ] 

Zheng Hu commented on HBASE-21559:
--

bq. But they will not hold the map lock when modifying right? 
Assume the case: 
t1.   start snapshot 
t2.   hold the table x-lock
t3.   rease the table x-lock;
t4.   downgrade to slock because table is enabled;
t5.   start snapshot on RS...
t6.   SplitTableRegionProcedure start . 
t7.   STRP hold the table s-lock
t8.   check isTakingSnapshot
.

Then at t8,  the SnapshotManager may update the status of handler at any time , 
I think. 



> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-06 Thread Wellington Chevreuil (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711499#comment-16711499
 ] 

Wellington Chevreuil commented on HBASE-21505:
--

Attaching new patch version. Previous one had some unused imports that caused 
compilation errors on the build, ain't sure why those don't happen locally for 
me. There was also some uncaught chedckstyle violations, now addressed.

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, 
> HBASE-21505-master.001.patch, HBASE-21505-master.002.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format

[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.

2018-12-06 Thread Wellington Chevreuil (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21505:
-
Attachment: HBASE-21505-master.002.patch

> Several inconsistencies on information reported for Replication Sources by 
> hbase shell status 'replication' command.
> 
>
> Key: HBASE-21505
> URL: https://issues.apache.org/jira/browse/HBASE-21505
> Project: HBase
>  Issue Type: Bug
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: 
> 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, 
> HBASE-21505-master.001.patch, HBASE-21505-master.002.patch
>
>
> While reviewing hbase shell status 'replication' command, noticed the 
> following issues related to replication source section:
> 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when 
> no new edits were added to source, so nothing was really shipped. Test steps 
> performed:
> 1.1) Source cluster with only one table targeted to replication;
> 1.2) Added a new row, confirmed the row appeared in Target cluster;
> 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp 
> shows current timestamp T1.
> 1.4) Waited 30 seconds, no new data added to source. Issued status 
> 'replication' command, now shows timestamp T2.
> 2) When replication is stuck due some connectivity issues or target 
> unavailability, if new edits are added in source, reported AgeOfLastShippedOp 
> is wrongly showing same value as "Replication Lag". This is incorrect, 
> AgeOfLastShippedOp should not change until there's indeed another edit 
> shipped to target. Test steps performed:
> 2.1) Source cluster with only one table targeted to replication;
> 2.2) Stopped target cluster RS;
> 2.3) Put a new row on source. Running status 'replication' command does show 
> lag increasing. TimeStampsOfLastShippedOp seems correct also, no further 
> updates as described on bullet #1 above.
> 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some 
> time before it got finally shipped to target. Test steps performed:
> 3.1) Source cluster with only one table targeted to replication;
> 3.2) Stopped target cluster RS;
> 3.3) Put a new row on source. 
> 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even 
> though there's no new edit shipped to target:
> {noformat}
> T1:
> ...
>  SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581
> ...
> T2:
> ...
> SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586
> ...
> {noformat}
> 3.5) Restart target cluster RS and verified the new row appeared there. No 
> new edit added, but status 'replication' command reports AgeOfLastShippedOp 
> as 0, while it should be the diff between the time it concluded shipping at 
> target and the time it was added in source:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0
> {noformat}
> 4) When replication is stuck due some connectivity issues or target 
> unavailability, if RS is restarted, once recovered queue source is started, 
> TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 
> GMT 1970, for example), thus "Replication Lag" also gives a complete 
> inaccurate value. 
> Tests performed:
> 4.1) Source cluster with only one table targeted to replication;
> 4.2) Stopped target cluster RS;
> 4.3) Put a new row on source, restart RS on source, waited a few seconds for 
> recovery queue source to startup, then it gives:
> {noformat}
> SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, 
> TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication 
> Lag=9223372036854775807
> {noformat}
> Also, we should report status to all sources running, current output format 
> gives the impression there’s only one, even when there are recovery queues, 
> for instance. 
> Here is a list of ideas on how the command should report under different 
> states of replication:
> a) Source started, target stopped, no edits

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-12-06 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711498#comment-16711498
 ] 

Allan Yang commented on HBASE-21217:


We have HBASE-21237 for branch-2.0/2.1. [~pankaj2461]

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19830) [AMv2] RPCs while holding (Region) Locks (to update hbase:meta with region state)

2018-12-06 Thread Pankaj Kumar (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-19830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-19830:
-
Component/s: amv2

> [AMv2] RPCs while holding (Region) Locks (to update hbase:meta with region 
> state)
> -
>
> Key: HBASE-19830
> URL: https://issues.apache.org/jira/browse/HBASE-19830
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Do we have to? Its a problem if we want Master to host regions and its just a 
> problem anyways. See HBASE-19828 for scenarios mostly around cluster shutdown 
> where the order in which processes go down is hard to control and it happens 
> that a server-hosted-client is trying to rpc a missing hbase:meta. 
> Eventually, we'll time out (server-hosted-clients have their retries upped) 
> but bad for MTTR and unit tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21549) Add shell command for serial replication peer

2018-12-06 Thread Peter Somogyi (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711472#comment-16711472
 ] 

Peter Somogyi commented on HBASE-21549:
---

Thanks for addressing my review comments. +1

> Add shell command for serial replication peer
> -
>
> Key: HBASE-21549
> URL: https://issues.apache.org/jira/browse/HBASE-21549
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21549.master.001.patch, 
> HBASE-21549.master.002.patch, HBASE-21549.master.003.patch
>
>
> add_peer support add a serial replication peer directly.
> set_peer_serial support change a replication peer's serial flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-12-06 Thread Pankaj Kumar (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711455#comment-16711455
 ] 

Pankaj Kumar commented on HBASE-21217:
--

Do you mean this bug is not applicable to branch-2.0/2.1? Pardon me if wrong

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711436#comment-16711436
 ] 

Zheng Hu commented on HBASE-21559:
--

bq. Do we really need to test the isFinished under the map lock? Just get it 
out and check?
I thought better to do this, in case of  we get a handler and someone update 
the handler's status right now?  

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711430#comment-16711430
 ] 

Duo Zhang commented on HBASE-21559:
---

Do we really need to test the isFinished under the map lock? Just get it out 
and check?

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711441#comment-16711441
 ] 

Duo Zhang commented on HBASE-21217:
---

For branch-2.1/branch-2.0, we will just call openRegion and closeRegion 
directly, IIRC.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711439#comment-16711439
 ] 

Duo Zhang commented on HBASE-21559:
---

But they will not hold the map lock when modifying right?

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21526) Use AsyncClusterConnection in ServerManager for getRsAdmin

2018-12-06 Thread Duo Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21526:
--
Attachment: HBASE-21526-HBASE-21512-v2.patch

> Use AsyncClusterConnection in ServerManager for getRsAdmin
> --
>
> Key: HBASE-21526
> URL: https://issues.apache.org/jira/browse/HBASE-21526
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-21526-HBASE-21512-v1.patch, 
> HBASE-21526-HBASE-21512-v2.patch, HBASE-21526-HBASE-21512.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-12-06 Thread Pankaj Kumar (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711438#comment-16711438
 ] 

Pankaj Kumar commented on HBASE-21217:
--

Ping [~allan163], any plan to backport this bug in 2.0/2.1 branch?

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711429#comment-16711429
 ] 

Zheng Hu commented on HBASE-21559:
--

BTW, I think  we can move the snapshot feature from procedure.v1 from 
procedure.v2 in the future.  So I assigned the HBASE-14413 to myself. 

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-14413) Procedure V2 - Snapshot V2

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711422#comment-16711422
 ] 

Zheng Hu commented on HBASE-14413:
--

Not in progress. Let me make this forward.  Hope you don't mind. [~vrodionov], 
[~mbertozzi] 

> Procedure V2 - Snapshot V2
> --
>
> Key: HBASE-14413
> URL: https://issues.apache.org/jira/browse/HBASE-14413
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Zheng Hu
>Priority: Major
>
> We need new implementation of snapshot feature that is more robust and 
> performant.  Ideally, it will work with multiple tables as well. The possible 
> areas of improvements:
> #  Must be flushless. Coordinated memstore flushes across a cluster are bad.
> #  Verification phase must be distributed, done in parallel and not on Master.
> In theory, the only info we need to record snapshot of a table: list of WAL 
> files, list of HFiles and max sequence id of an edit which has been flushed 
> per Region.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711425#comment-16711425
 ] 

Zheng Hu commented on HBASE-21559:
--

Run the UT in my localhost 5 times, Seems OK now.

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21560) Return a new TableDescriptor for MasterObserver#preModifyTable to allow coprocessor modify the TableDescriptor

2018-12-06 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711421#comment-16711421
 ] 

Duo Zhang commented on HBASE-21560:
---

+1.

> Return a new TableDescriptor for MasterObserver#preModifyTable to allow 
> coprocessor modify the TableDescriptor
> --
>
> Key: HBASE-21560
> URL: https://issues.apache.org/jira/browse/HBASE-21560
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Priority: Major
>
> Same with HBASE-21550. The new TableDescriptor is immutable for 2.0+. But in 
> our use case, the coprocessor may change the TableDescriptor when 
> preModifyTable. It is allowed before 2.0. For 2.0+, We can return a new 
> TableDescriptor for MasterObserver#preModifyTable to allow this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HBASE-14413) Procedure V2 - Snapshot V2

2018-12-06 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-14413:


Assignee: Zheng Hu

> Procedure V2 - Snapshot V2
> --
>
> Key: HBASE-14413
> URL: https://issues.apache.org/jira/browse/HBASE-14413
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Zheng Hu
>Priority: Major
>
> We need new implementation of snapshot feature that is more robust and 
> performant.  Ideally, it will work with multiple tables as well. The possible 
> areas of improvements:
> #  Must be flushless. Coordinated memstore flushes across a cluster are bad.
> #  Verification phase must be distributed, done in parallel and not on Master.
> In theory, the only info we need to record snapshot of a table: list of WAL 
> files, list of HFiles and max sequence id of an edit which has been flushed 
> per Region.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-06 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711416#comment-16711416
 ] 

Hadoop QA commented on HBASE-21514:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 30 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 48s{color} 
| {color:red} hbase-server generated 4 new + 184 unchanged - 4 fixed = 188 
total (was 188) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} hbase-server: The patch generated 0 new + 868 
unchanged - 58 fixed = 868 total (was 926) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
51s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}127m 
46s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}164m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21514 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12950812/HBASE-21514.master.009.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d903706a348b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 12e75a8a63 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| javac | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15207/artifact/patchprocess/diff-compile-javac-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15207/testReport/ |
| Max. process+thread count | 4906 (vs. ulimit of 1

[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21559:
-
Status: Patch Available  (was: Open)

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711397#comment-16711397
 ] 

Zheng Hu commented on HBASE-21559:
--

Currently,  the snapshotManager grab the object lock in many method.  This is a 
very rough way of locking.  I think we should change the locking way of 
SnapshotManager , not just synchronized the big SnapshotManager object, but use 
a more concrete lock (in case of dead lock). 
Anyway , Let me fix this dead lock firstly.  So upload a patch.v1.  


> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky

2018-12-06 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21559:
-
Attachment: HBASE-21559.v1.patch

> The RestoreSnapshotFromClientTestBase related UT are flaky
> --
>
> Key: HBASE-21559
> URL: https://issues.apache.org/jira/browse/HBASE-21559
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5
>
> Attachments: HBASE-21559.v1.patch, 
> TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt,
>  
> org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt
>
>
> The  related UT are: 
> * TestRestoreSnapshotFromClientAfterSplittingRegions
> * TestRestoreSnapshotFromClientWithRegionReplicas
> * TestMobRestoreSnapshotFromClientAfterSplittingRegions
> I guess the main problem is:  a dead lock between SplitTableRegionProcedure 
> and SnapshotProcedure.. 
> Attached logs from the failed UT. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711380#comment-16711380
 ] 

Hudson commented on HBASE-21550:


Results for branch branch-2
[build #1542 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add a new method preCreateTableRegionInfos for MasterObserver which allows 
> CPs to modify the TableDescriptor
> 
>
> Key: HBASE-21550
> URL: https://issues.apache.org/jira/browse/HBASE-21550
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21550.patch
>
>
> Before 2.0, we will pass a HTableDescriptor and the CPs can modify the schema 
> of a table, but now we will pass a TableDescriptor, which is immutable. I 
> think it is correct to pass an immutable instance here, but we should have a 
> return value for this method to allow CPs to return a new TableDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711382#comment-16711382
 ] 

Hudson commented on HBASE-21551:


Results for branch branch-2
[build #1542 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21534) TestAssignmentManager is flakey

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711381#comment-16711381
 ] 

Hudson commented on HBASE-21534:


Results for branch branch-2
[build #1542 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> TestAssignmentManager is flakey
> ---
>
> Key: HBASE-21534
> URL: https://issues.apache.org/jira/browse/HBASE-21534
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21534-addendum-v1.patch, 
> HBASE-21534-addendum.patch, HBASE-21534.patch
>
>
> See this in the outout and then the test hang
> {noformat}
> 2018-11-29 20:47:50,061 WARN  [MockRSProcedureDispatcher-pool5-t10] 
> assignment.AssignmentManager(894): The region server localhost,102,1 is 
> already dead, skip reportRegionStateTransition call
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21557) Set version to 2.0.4 on branch-2.0 so can cut an RC

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711336#comment-16711336
 ] 

Hudson commented on HBASE-21557:


Results for branch branch-2.0
[build #1141 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Set version to 2.0.4 on branch-2.0 so can cut an RC
> ---
>
> Key: HBASE-21557
> URL: https://issues.apache.org/jira/browse/HBASE-21557
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.4
>
>
> $ mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.0.4
> $ find . -name pom.xml -exec git add {} \;
> $ git commit ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21126) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711337#comment-16711337
 ] 

Hudson commented on HBASE-21126:


Results for branch branch-2.0
[build #1141 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Add ability for HBase Canary to ignore a configurable number of ZooKeeper 
> down nodes
> 
>
> Key: HBASE-21126
> URL: https://issues.apache.org/jira/browse/HBASE-21126
> Project: HBase
>  Issue Type: Improvement
>  Components: canary, Zookeeper
>Affects Versions: 1.0.0, 3.0.0, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21126.branch-1.001.patch, 
> HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, 
> HBASE-21126.master.003.patch, zookeeperCanaryLocalTestValidation.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper 
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper 
> server in the ensemble. If any server is unavailable or unresponsive, the 
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can 
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node 
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures {code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still 
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21126) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711304#comment-16711304
 ] 

Hudson commented on HBASE-21126:


Results for branch branch-2.1
[build #662 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add ability for HBase Canary to ignore a configurable number of ZooKeeper 
> down nodes
> 
>
> Key: HBASE-21126
> URL: https://issues.apache.org/jira/browse/HBASE-21126
> Project: HBase
>  Issue Type: Improvement
>  Components: canary, Zookeeper
>Affects Versions: 1.0.0, 3.0.0, 2.0.0
>Reporter: David Manning
>Assignee: David Manning
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21126.branch-1.001.patch, 
> HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, 
> HBASE-21126.master.003.patch, zookeeperCanaryLocalTestValidation.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper 
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper 
> server in the ensemble. If any server is unavailable or unresponsive, the 
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can 
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node 
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures {code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still 
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711302#comment-16711302
 ] 

Hudson commented on HBASE-21551:


Results for branch branch-2.1
[build #662 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21558) Set version to 2.1.2 on branch-2.1 so can cut an RC

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711303#comment-16711303
 ] 

Hudson commented on HBASE-21558:


Results for branch branch-2.1
[build #662 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Set version to 2.1.2 on branch-2.1 so can cut an RC
> ---
>
> Key: HBASE-21558
> URL: https://issues.apache.org/jira/browse/HBASE-21558
> Project: HBase
>  Issue Type: Sub-task
>  Components: release
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.1.2
> $ find . -name pom.xml -exec git add {} \;
> $ git commit ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711284#comment-16711284
 ] 

Hudson commented on HBASE-21551:


Results for branch master
[build #648 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/648/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memory leak when use scan with STREAM at server side
> 
>
> Key: HBASE-21551
> URL: https://issues.apache.org/jira/browse/HBASE-21551
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
> Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, 
> HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>   |---> HStore#getScanner
> |--> StoreScanner()
> |---> 
> StoreFileScanner#getScannersForStoreFiles
>   |--> 
> HStoreFile#getStreamScanner  #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, 
> but not remove the StreamReader from streamReaders until closing the store 
> file. 
> So if we  scan with stream with  so many times, the streamReaders hash map 
> will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB 
> in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time 
> full gc ( ~ 110 sec)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711283#comment-16711283
 ] 

Hudson commented on HBASE-21550:


Results for branch master
[build #648 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/648/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add a new method preCreateTableRegionInfos for MasterObserver which allows 
> CPs to modify the TableDescriptor
> 
>
> Key: HBASE-21550
> URL: https://issues.apache.org/jira/browse/HBASE-21550
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21550.patch
>
>
> Before 2.0, we will pass a HTableDescriptor and the CPs can modify the schema 
> of a table, but now we will pass a TableDescriptor, which is immutable. I 
> think it is correct to pass an immutable instance here, but we should have a 
> return value for this method to allow CPs to return a new TableDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21464) Splitting blocked with meta NSRE during split transaction

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711256#comment-16711256
 ] 

Hudson commented on HBASE-21464:


Results for branch branch-1.4
[build #577 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Splitting blocked with meta NSRE during split transaction
> -
>
> Key: HBASE-21464
> URL: https://issues.apache.org/jira/browse/HBASE-21464
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.8, 1.4.7
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Blocker
> Fix For: 1.5.0, 1.4.9
>
> Attachments: HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch, 
> HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch
>
>
> Splitting is blocked during split transaction. The split worker is trying to 
> update meta but isn't able to relocate it after NSRE:
> {noformat}
> 2018-11-09 17:50:45,277 INFO  
> [regionserver/ip-172-31-5-92.us-west-2.compute.internal/172.31.5.92:8120-splits-1541785709434]
>  client.RpcRetryingCaller: Call exception, tries=13, retries=350, 
> started=88590 ms ago, cancelled=false, 
> msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 
> is not online on ip-172-31-13-83.us-west-2.compute.internal,8120,1541785618832
>      at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3088)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2198)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2396)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)row 
> 'test,,1541785709452.5ba6596f0050c2dab969d152829227c6.44' on table 
> 'hbase:meta' at region=hbase:meta,1.1588230740, 
> hostname=ip-172-31-15-225.us-west-2.compute.internal,8120,1541785640586, 
> seqNum=0{noformat}
> Clients, in this case YCSB, are hung with part of the keyspace missing:
> {noformat}
> 2018-11-09 17:51:06,033 DEBUG [hconnection-0x5739e567-shared--pool1-t165] 
> client.ConnectionManager$HConnectionImplementation: locateRegionInMeta 
> parentTable=hbase:meta, metaLocation=, attempt=14 of 35 failed; retrying 
> after sleep of 20158 because: No server address listed in hbase:meta for 
> region 
> test,user307326104267982763,1541785754600.ef90030b05cb02305b75e9bfbc3ee081. 
> containing row user3301635648728421323{noformat}
> Balancing cannot run indefinitely because the split transaction is stuck
> {noformat}
> 2018-11-09 17:49:55,478 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=8100] master.HMaster: 
> Not running balancer because 3 region(s) in transition: 
> [{ef90030b05cb02305b75e9bfbc3ee081 state=SPLITTING_NEW, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute.internal,8120,1541785626417}, 
> {5ba6596f0050c2dab969d152829227c6 state=SPLITTING, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21464) Splitting blocked with meta NSRE during split transaction

2018-12-06 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711253#comment-16711253
 ] 

Hudson commented on HBASE-21464:


Results for branch branch-1
[build #580 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Splitting blocked with meta NSRE during split transaction
> -
>
> Key: HBASE-21464
> URL: https://issues.apache.org/jira/browse/HBASE-21464
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.8, 1.4.7
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Blocker
> Fix For: 1.5.0, 1.4.9
>
> Attachments: HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch, 
> HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch
>
>
> Splitting is blocked during split transaction. The split worker is trying to 
> update meta but isn't able to relocate it after NSRE:
> {noformat}
> 2018-11-09 17:50:45,277 INFO  
> [regionserver/ip-172-31-5-92.us-west-2.compute.internal/172.31.5.92:8120-splits-1541785709434]
>  client.RpcRetryingCaller: Call exception, tries=13, retries=350, 
> started=88590 ms ago, cancelled=false, 
> msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 
> is not online on ip-172-31-13-83.us-west-2.compute.internal,8120,1541785618832
>      at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3088)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2198)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2396)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)row 
> 'test,,1541785709452.5ba6596f0050c2dab969d152829227c6.44' on table 
> 'hbase:meta' at region=hbase:meta,1.1588230740, 
> hostname=ip-172-31-15-225.us-west-2.compute.internal,8120,1541785640586, 
> seqNum=0{noformat}
> Clients, in this case YCSB, are hung with part of the keyspace missing:
> {noformat}
> 2018-11-09 17:51:06,033 DEBUG [hconnection-0x5739e567-shared--pool1-t165] 
> client.ConnectionManager$HConnectionImplementation: locateRegionInMeta 
> parentTable=hbase:meta, metaLocation=, attempt=14 of 35 failed; retrying 
> after sleep of 20158 because: No server address listed in hbase:meta for 
> region 
> test,user307326104267982763,1541785754600.ef90030b05cb02305b75e9bfbc3ee081. 
> containing row user3301635648728421323{noformat}
> Balancing cannot run indefinitely because the split transaction is stuck
> {noformat}
> 2018-11-09 17:49:55,478 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=8100] master.HMaster: 
> Not running balancer because 3 region(s) in transition: 
> [{ef90030b05cb02305b75e9bfbc3ee081 state=SPLITTING_NEW, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute.internal,8120,1541785626417}, 
> {5ba6596f0050c2dab969d152829227c6 state=SPLITTING, ts=1541785754606, 
> server=ip-172-31-5-92.us-west-2.compute{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 114 matches

Mail list logo