[jira] [Updated] (HBASE-14108) Procedure V2 - Administrative Task: provide an API to abort a procedure

2015-09-03 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14108:
---
Attachment: (was: HBASE-14108.v2-master.patch)

> Procedure V2 - Administrative Task: provide an API to abort a procedure
> ---
>
> Key: HBASE-14108
> URL: https://issues.apache.org/jira/browse/HBASE-14108
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14108.v1-branch-1.patch, 
> HBASE-14108.v1-master.patch, HBASE-14108.v2-master.patch
>
>
> With Procedure-V2 in production since HBASE 1.1 release, there is a need to 
> abort a procedure (eg. for a long-running procedure that stucks somewhere and 
> blocks others).  The command could either from shell or Web UI.
> This task tracks the work to provide an API to abort a procedure (either 
> rollback or simply quit).  This API could be used either from shell or Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729264#comment-14729264
 ] 

Hadoop QA commented on HBASE-14359:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12753973/HBASE-14359-master-branch1-v1.patch
  against master branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12753973

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15404//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15404//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15404//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15404//console

This message is automatically generated.

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14108) Procedure V2 - Administrative Task: provide an API to abort a procedure

2015-09-03 Thread Stephen Yuan Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14108:
---
Attachment: HBASE-14108.v2-master.patch

> Procedure V2 - Administrative Task: provide an API to abort a procedure
> ---
>
> Key: HBASE-14108
> URL: https://issues.apache.org/jira/browse/HBASE-14108
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14108.v1-branch-1.patch, 
> HBASE-14108.v1-master.patch, HBASE-14108.v2-master.patch, 
> HBASE-14108.v2-master.patch
>
>
> With Procedure-V2 in production since HBASE 1.1 release, there is a need to 
> abort a procedure (eg. for a long-running procedure that stucks somewhere and 
> blocks others).  The command could either from shell or Web UI.
> This task tracks the work to provide an API to abort a procedure (either 
> rollback or simply quit).  This API could be used either from shell or Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729236#comment-14729236
 ] 

Hadoop QA commented on HBASE-14359:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12753980/HBASE-14359-branch-1-v1.patch
  against branch-1 branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12753980

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.security.access.TestAccessControlFilter.testQualifierAccess(TestAccessControlFilter.java:96)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15405//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15405//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15405//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15405//console

This message is automatically generated.

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14108) Procedure V2 - Administrative Task: provide an API to abort a procedure

2015-09-03 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729083#comment-14729083
 ] 

Stephen Yuan Jiang commented on HBASE-14108:


no test failures and zombie tests are unrelated to this change (which is an 
isolated change).

> Procedure V2 - Administrative Task: provide an API to abort a procedure
> ---
>
> Key: HBASE-14108
> URL: https://issues.apache.org/jira/browse/HBASE-14108
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-14108.v1-branch-1.patch, 
> HBASE-14108.v1-master.patch, HBASE-14108.v2-master.patch
>
>
> With Procedure-V2 in production since HBASE 1.1 release, there is a need to 
> abort a procedure (eg. for a long-running procedure that stucks somewhere and 
> blocks others).  The command could either from shell or Web UI.
> This task tracks the work to provide an API to abort a procedure (either 
> rollback or simply quit).  This API could be used either from shell or Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729538#comment-14729538
 ] 

stack commented on HBASE-14317:
---

Ran on small cluster (1B ITBLL with monkeys and confirmed all data there). 
Checked logs. No hang or no complaints related to this patch. Just the usual 
complaint about slow HDFS including stuff like this:

2015-09-02 23:56:52,790 WARN  
[regionserver/c2023.halxg.cloudera.com/10.20.84.29:16020.logRoller] 
hdfs.DFSClient: Slow waitForAckedSeqno took 2577ms (threshold=20ms)

Also dfs client complaints and exceptions... but nothing from RS or related to 
WAL.

Looking at the failed test, on the one hand, the lease was just robbed on all 
WALs out from under the cluster. Let me make sure the fail is because of 
stricter semantic and not from any other byproduct. Looking at it, we should be 
able to ride over the HDFS restart. Will be back.


> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14361) Investigate unused connection objects

2015-09-03 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HBASE-14361:


 Summary: Investigate unused connection objects
 Key: HBASE-14361
 URL: https://issues.apache.org/jira/browse/HBASE-14361
 Project: HBase
  Issue Type: Task
  Components: Client
Reporter: Nick Dimiduk


Over on HBASE-12911 I have a patch that registers Connection instances with the 
metrics system. In both standalone server and tll client applications, I was 
surprised to see multiple connection objects showing up that are unused. These 
are pretty heavy objects, including lots of client threads for the batch pool. 
We should track these down and remove them -- if they're not some kind of 
phantom artifacts of my WIP patch over there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-14344:

   Resolution: Fixed
Fix Version/s: 1.1.3
   1.0.3
   Status: Resolved  (was: Patch Available)

> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729636#comment-14729636
 ] 

Yu Li commented on HBASE-14359:
---

This online issue has been bothering us for a week and now it's a release of 
suffer. Hope it could cure the zombie tests here also. :-)

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14314) Metrics for block cache should take region replicas into account

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729591#comment-14729591
 ] 

Hadoop QA commented on HBASE-14314:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754011/14314-v5.txt
  against master branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12754011

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15406//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15406//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15406//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15406//console

This message is automatically generated.

> Metrics for block cache should take region replicas into account
> 
>
> Key: HBASE-14314
> URL: https://issues.apache.org/jira/browse/HBASE-14314
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14314-v1.txt, 14314-v2.txt, 14314-v3.txt, 14314-v4.txt, 
> 14314-v5.txt
>
>
> Currently metrics for block cache are aggregates in the sense that they don't 
> distinguish primary from secondary / tertiary replicas.
> This JIRA separates the block cache metrics for primary region replica from 
> the aggregate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729578#comment-14729578
 ] 

stack commented on HBASE-14317:
---

bq. Brilliant!

Smile. This is how it was working. I just broke it by not allowing 
syncs-after-failed-appends. Sorry if gave wrong impression. Smile.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729582#comment-14729582
 ] 

Nick Dimiduk commented on HBASE-14359:
--

Nice bit of sleuthing here [~victorunique]. Could be a reason our tests hang 
over on builds.apache.org when resources get tight.

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729543#comment-14729543
 ] 

Matteo Bertozzi commented on HBASE-14261:
-

+1

wrong message, you can fix it on commit
{code}
+++ b/hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
+  @Override
+  public void stopDataNode(ServerName serverName) throws IOException {
+LOG.warn("Aborting datanodes on mini cluster is not supported");
+  }
{code}

> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729572#comment-14729572
 ] 

Nick Dimiduk commented on HBASE-14317:
--

bq. Looking at it, we should be able to ride over the HDFS restart.

Brilliant!

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729590#comment-14729590
 ] 

stack commented on HBASE-14359:
---

bq. Could be a reason our tests hang over on builds.apache.org when resources 
get tight.

For sure we were seeing cases of OOME can't create native threads lately. Some 
of this has been ameliorated by our spinning up less threads when testing but 
still work to do.

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14360) Client GC log path is not computed

2015-09-03 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HBASE-14360:


 Summary: Client GC log path is not computed
 Key: HBASE-14360
 URL: https://issues.apache.org/jira/browse/HBASE-14360
 Project: HBase
  Issue Type: Bug
  Components: scripts
Reporter: Nick Dimiduk
Priority: Minor


Looking for GC logs on the client side, I noticed the nice work from HBASE-7817 
that gives us the settings, just uncomment and run. Giving this a try with ltt, 
looks like {{}} is not replaced according to the comments. Seems 
this work is done by {{bin/hbase-daemon.sh}}, not {{bin/hbase}}. The result is 
my ltt produced a file {{.0}} in {{$(pwd)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729391#comment-14729391
 ] 

Andrew Purtell commented on HBASE-14359:


lgtm

Committing shortly after running some local tests unless objection

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14108) Procedure V2 - Administrative Task: provide an API to abort a procedure

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729386#comment-14729386
 ] 

Hudson commented on HBASE-14108:


SUCCESS: Integrated in HBase-1.3-IT #129 (See 
[https://builds.apache.org/job/HBase-1.3-IT/129/])
HBASE-14108 Procedure V2 - Administrative Task: provide an API to abort a 
procedure (Stephen Yuan Jiang) (syuanjiangdev: rev 
90b8a3c894e211a8b0e5236f1872fc1576a455b2)
* 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureRecovery.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin2.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* hbase-protocol/src/main/protobuf/Master.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestProcedureAdmin.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java


> Procedure V2 - Administrative Task: provide an API to abort a procedure
> ---
>
> Key: HBASE-14108
> URL: https://issues.apache.org/jira/browse/HBASE-14108
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14108.v1-branch-1.patch, 
> HBASE-14108.v1-master.patch, HBASE-14108.v2-master.patch
>
>
> With Procedure-V2 in production since HBASE 1.1 release, there is a need to 
> abort a procedure (eg. for a long-running procedure that stucks somewhere and 
> blocks others).  The command could either from shell or Web UI.
> This task tracks the work to provide an API to abort a procedure (either 
> rollback or simply quit).  This API could be used either from shell or Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14108) Procedure V2 - Administrative Task: provide an API to abort a procedure

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729375#comment-14729375
 ] 

Hudson commented on HBASE-14108:


FAILURE: Integrated in HBase-1.3 #146 (See 
[https://builds.apache.org/job/HBase-1.3/146/])
HBASE-14108 Procedure V2 - Administrative Task: provide an API to abort a 
procedure (Stephen Yuan Jiang) (syuanjiangdev: rev 
90b8a3c894e211a8b0e5236f1872fc1576a455b2)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestProcedureAdmin.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin2.java
* 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureRecovery.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* hbase-protocol/src/main/protobuf/Master.proto


> Procedure V2 - Administrative Task: provide an API to abort a procedure
> ---
>
> Key: HBASE-14108
> URL: https://issues.apache.org/jira/browse/HBASE-14108
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14108.v1-branch-1.patch, 
> HBASE-14108.v1-master.patch, HBASE-14108.v2-master.patch
>
>
> With Procedure-V2 in production since HBASE 1.1 release, there is a need to 
> abort a procedure (eg. for a long-running procedure that stucks somewhere and 
> blocks others).  The command could either from shell or Web UI.
> This task tracks the work to provide an API to abort a procedure (either 
> rollback or simply quit).  This API could be used either from shell or Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14314) Metrics for block cache should take region replicas into account

2015-09-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14314:
---
Attachment: 14314-v5.txt

Patch v5 addresses checkstyle warning.

> Metrics for block cache should take region replicas into account
> 
>
> Key: HBASE-14314
> URL: https://issues.apache.org/jira/browse/HBASE-14314
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14314-v1.txt, 14314-v2.txt, 14314-v3.txt, 14314-v4.txt, 
> 14314-v5.txt
>
>
> Currently metrics for block cache are aggregates in the sense that they don't 
> distinguish primary from secondary / tertiary replicas.
> This JIRA separates the block cache metrics for primary region replica from 
> the aggregate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14190) Assign system tables ahead of user region assignment

2015-09-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729812#comment-14729812
 ] 

Ted Yu commented on HBASE-14190:


Master log has been analyzed but no conclusion as of now.

> Assign system tables ahead of user region assignment
> 
>
> Key: HBASE-14190
> URL: https://issues.apache.org/jira/browse/HBASE-14190
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Attachments: 14190-system-wal-v1.txt, 14190-v12.4.txt, 14190-v12.txt
>
>
> Currently the namespace table region is assigned like user regions.
> I spent several hours working with a customer where master couldn't finish 
> initialization.
> Even though master was restarted quite a few times, it went down with the 
> following:
> {code}
> 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] 
> master.HMaster: Master server abort: loaded coprocessors are: []
> 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 30ms waiting for namespace table to be 
> assigned
>   at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
>   at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:985)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:779)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> During previous run(s), namespace table was created, hence leaving an entry 
> in hbase:meta.
> The following if block in TableNamespaceManager#start() was skipped:
> {code}
> if (!MetaTableAccessor.tableExists(masterServices.getConnection(),
>   TableName.NAMESPACE_TABLE_NAME)) {
> {code}
> TableNamespaceManager#start() spins, waiting for namespace region to be 
> assigned.
> There was issue in master assigning user regions.
> We tried issuing 'assign' command from hbase shell which didn't work because 
> of the following check in MasterRpcServices#assignRegion():
> {code}
>   master.checkInitialized();
> {code}
> This scenario can be avoided if we assign hbase:namespace table after 
> hbase:meta is assigned but before user table region assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14190) Assign system tables ahead of user region assignment

2015-09-03 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729772#comment-14729772
 ] 

Jerry He commented on HBASE-14190:
--

Hi, Ted

A late question.  What was the cause that the namespace region wasn't assigned 
within the 30ms timeout in your original case?

> Assign system tables ahead of user region assignment
> 
>
> Key: HBASE-14190
> URL: https://issues.apache.org/jira/browse/HBASE-14190
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Critical
> Attachments: 14190-system-wal-v1.txt, 14190-v12.4.txt, 14190-v12.txt
>
>
> Currently the namespace table region is assigned like user regions.
> I spent several hours working with a customer where master couldn't finish 
> initialization.
> Even though master was restarted quite a few times, it went down with the 
> following:
> {code}
> 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] 
> master.HMaster: Master server abort: loaded coprocessors are: []
> 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> java.io.IOException: Timedout 30ms waiting for namespace table to be 
> assigned
>   at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
>   at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:985)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:779)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646)
>   at java.lang.Thread.run(Thread.java:744)
> {code}
> During previous run(s), namespace table was created, hence leaving an entry 
> in hbase:meta.
> The following if block in TableNamespaceManager#start() was skipped:
> {code}
> if (!MetaTableAccessor.tableExists(masterServices.getConnection(),
>   TableName.NAMESPACE_TABLE_NAME)) {
> {code}
> TableNamespaceManager#start() spins, waiting for namespace region to be 
> assigned.
> There was issue in master assigning user regions.
> We tried issuing 'assign' command from hbase shell which didn't work because 
> of the following check in MasterRpcServices#assignRegion():
> {code}
>   master.checkInitialized();
> {code}
> This scenario can be avoided if we assign hbase:namespace table after 
> hbase:meta is assigned but before user table region assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14215) Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient

2015-09-03 Thread Biju Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729693#comment-14729693
 ] 

Biju Nair commented on HBASE-14215:
---

Thanks [~enis] for your comments. Disabling rack awareness will enable SLB to 
come-up with a better plan even with lower 
{{hbase.master.balancer.stochastic.primaryRegionCountCost}}. Will try to do 
some tests.

Given that potential candidates are generated randomly, one would assume that 
"global optimum" will be attained with multiple candidate generations and there 
will be no "local optimum". No?

As we included a new cost function for primary replication skew, will taking 
into account of primary replicas in the candidate generator (may be in 
{{RegionReplicaCandidateGenerator}}) can help keep 
{{hbase.master.balancer.stochastic.primaryRegionCountCost}} lower?

> Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient 
> ---
>
> Key: HBASE-14215
> URL: https://issues.apache.org/jira/browse/HBASE-14215
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Reporter: Biju Nair
>Priority: Minor
> Attachments: 14215-v1.txt
>
>
> Current multiplier of 500 used in the stochastic balancer cost function 
> {{PrimaryRegionCountSkewCostFunction}} to calculate the cost of  total 
> primary replication skew doesn't seem to be sufficient to prevent the skews 
> (Refer HBASE-14110). We would want the default cost to be a higher value so 
> that skews in primary region replica has higher cost. The following is the 
> test result by setting the multiplier value to 1 (same as the region 
> replica rack cost multiplier) on a 3 Rack 9 RS node cluster which seems to 
> get the balancer distribute the primaries uniformly.
> *Initial Primary replica distribution - using the current multiplier* 
>  |r1n10|  102|
>  |r1n11|  85|
>  |r1n9|88|
>  |r2n10|  120|
>  |r2n11|  120|
>  |r2n9|   124|
>  |r3n10|  135|
>  |r3n11|  124|
>  |r3n9|129|
> *After long duration of read & writes - using current multiplier* 
> | r1n10|  102|
> | r1n11|  85|
> | r1n9|88|
> | r2n10|  120|
> | r2n11|  120|
> | r2n9 |   124|
> | r3n10|  135|
> | r3n11|  124|
> | r3n9|129|
> *After manual balancing*  
> | r1n10|  102|
> | r1n11|  85|
> | r1n9|88|
> | r2n10|  120|
> | r2n11|  120|
> | r2n9 |   124|
> | r3n10|  135|
> | r3n11|  124|
> | r3n9|129|
> *Increased multiplier for primaryRegionCountSkewCost to 1*
> | r1n10|  114|
> | r1n11 | 113|
> | r1n9 |   114|
> | r2n10|  114|
> | r2n11|  114|
> | r2n9 |   113|
> | r3n10|  115|
> | r3n11|  115|
> | r3n9 |   115 |
> Setting the {{PrimaryRegionCountSkewCostFunction}} multiplier value to 1 
> should help HBase general use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-6617) ReplicationSourceManager should be able to track multiple WAL paths

2015-09-03 Thread Yu Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-6617:
-
Attachment: HBASE-6617_v9.patch

Update the patch in sync with rb, also addressing the javadoc warning.

> ReplicationSourceManager should be able to track multiple WAL paths
> ---
>
> Key: HBASE-6617
> URL: https://issues.apache.org/jira/browse/HBASE-6617
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Ted Yu
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-6617.patch, HBASE-6617_v2.patch, 
> HBASE-6617_v3.patch, HBASE-6617_v4.patch, HBASE-6617_v7.patch, 
> HBASE-6617_v9.patch
>
>
> Currently ReplicationSourceManager uses logRolled() to receive notification 
> about new HLog and remembers it in latestPath.
> When region server has multiple WAL support, we need to keep track of 
> multiple Path's in ReplicationSourceManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13823) Procedure V2: unnecessaery operaions on AssignmentManager#recoverTableInDisablingState() and recoverTableInEnablingState()

2015-09-03 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-13823:

Attachment: HBASE-13823-v4.patch

> Procedure V2: unnecessaery operaions on 
> AssignmentManager#recoverTableInDisablingState() and 
> recoverTableInEnablingState()
> --
>
> Key: HBASE-13823
> URL: https://issues.apache.org/jira/browse/HBASE-13823
> Project: HBase
>  Issue Type: Sub-task
>  Components: master, proc-v2
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Stephen Yuan Jiang
>Assignee: Matteo Bertozzi
> Attachments: HBASE-13823-v0.patch, HBASE-13823-v1.patch, 
> HBASE-13823-v2.patch, HBASE-13823-v3.patch, HBASE-13823-v4.patch
>
>
> AssignmentManager#recoverTableInDisablingState() and 
> AssignmentManager#recoverTableInEnablingState try to complete unfinished 
> enable/disable table operations.  In the past, it is necessary, as master 
> failure could leave table in bad state.  With HBASE-13211, enable/disable 
> operations would be auto-recover by Procedure-V2 logic.  Those recovery 
> operation is not necessary: we can either remove those recovery operation or 
> not replay enable/disable operations in procedure queue.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14108) Procedure V2 - Administrative Task: provide an API to abort a procedure

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728553#comment-14728553
 ] 

Hudson commented on HBASE-14108:


FAILURE: Integrated in HBase-TRUNK #6775 (See 
[https://builds.apache.org/job/HBase-TRUNK/6775/])
HBASE-14108 Procedure V2 Administrative Task: provide an API to abort a 
procedure (Stephen Yuan Jiang) (syuanjiangdev: rev 
3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495)
* 
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureRecovery.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
* 
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
* hbase-protocol/src/main/protobuf/Master.proto
* hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin2.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestProcedureAdmin.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


> Procedure V2 - Administrative Task: provide an API to abort a procedure
> ---
>
> Key: HBASE-14108
> URL: https://issues.apache.org/jira/browse/HBASE-14108
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-14108.v1-branch-1.patch, 
> HBASE-14108.v1-master.patch, HBASE-14108.v2-master.patch
>
>
> With Procedure-V2 in production since HBASE 1.1 release, there is a need to 
> abort a procedure (eg. for a long-running procedure that stucks somewhere and 
> blocks others).  The command could either from shell or Web UI.
> This task tracks the work to provide an API to abort a procedure (either 
> rollback or simply quit).  This API could be used either from shell or Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13153) enable bulkload to support replication

2015-09-03 Thread Bhupendra Kumar Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728572#comment-14728572
 ] 

Bhupendra Kumar Jain commented on HBASE-13153:
--

Having cluster id as part of hfile meta data is really nice to have. This meta 
data can clearly indicate the cluster source. 

But during replication, with this appraoch,the cluster id needs to be added to 
each hfile meta block. This will require rewriting of each hfile meta block, so 
we think this will slow down the replication process compare to writing cluster 
id in zk node.  

Also during replication process, when replication end point detect the cycle, 
it needs to refer to hfile meta data. Consider the case where hfile is in 
archive, So I think there is no meta information available for archive file in 
cache. This may take more time too. Please correct if I got it wrong ?

> enable bulkload to support replication
> --
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBase Bulk Load Replication.pdf
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-8015) Support for Namespaces

2015-09-03 Thread kartik (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728581#comment-14728581
 ] 

kartik commented on HBASE-8015:
---

I would like to know if this JIRA enhancement also support below four items 
using namespace ?

Cell Query (Single Value)
Cell or Row Query (Multiple Values)
Scanner Get Next
Scanner Creation


Regards,
Kartik Bhatnagar

> Support for Namespaces
> --
>
> Key: HBASE-8015
> URL: https://issues.apache.org/jira/browse/HBASE-8015
> Project: HBase
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Francis Liu
> Attachments: HBASE-8015_draft_94.patch, Namespace Design.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728674#comment-14728674
 ] 

Hadoop QA commented on HBASE-14261:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753910/HBASE-14261.patch
  against master branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12753910

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 33 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15400//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15400//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15400//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15400//console

This message is automatically generated.

> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728723#comment-14728723
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753923/14317v13.txt
  against master branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12753923

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15403//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, 
> HBASE-14317-v1.patch, HBASE-14317-v2.patch, HBASE-14317-v3.patch, 
> HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS stuck on WAL sync to a 
> dead DN - Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, 
> san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14108) Procedure V2 - Administrative Task: provide an API to abort a procedure

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728657#comment-14728657
 ] 

Hadoop QA commented on HBASE-14108:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12753911/HBASE-14108.v1-branch-1.patch
  against branch-1 branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12753911

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+   * rpc AbortProcedure(.hbase.pb.AbortProcedureRequest) returns 
(.hbase.pb.AbortProcedureResponse);
+ * rpc AbortProcedure(.hbase.pb.AbortProcedureRequest) returns 
(.hbase.pb.AbortProcedureResponse);

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL.testScanForSuperUserWithFewerLabelAuths(TestVisibilityLabelsWithACL.java:147)
at 
org.apache.camel.component.jetty.jettyproducer.HttpJettyProducerRecipientListCustomThreadPoolTest.testRecipientList(HttpJettyProducerRecipientListCustomThreadPoolTest.java:40)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15402//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15402//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15402//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15402//console

This message is automatically generated.

> Procedure V2 - Administrative Task: provide an API to abort a procedure
> ---
>
> Key: HBASE-14108
> URL: https://issues.apache.org/jira/browse/HBASE-14108
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Attachments: HBASE-14108.v1-branch-1.patch, 
> HBASE-14108.v1-master.patch, HBASE-14108.v2-master.patch
>
>
> With Procedure-V2 in production since HBASE 1.1 release, there is a need to 
> abort a procedure (eg. for a long-running procedure that stucks somewhere and 
> blocks others).  The command could either from shell or Web UI.
> This task tracks the work to provide an API to abort a procedure (either 
> rollback or simply quit).  This API could be used either from shell or Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14314) Metrics for block cache should take region replicas into account

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728676#comment-14728676
 ] 

Hadoop QA commented on HBASE-14314:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12753909/14314-v4.txt
  against master branch at commit 3341f13e71a25bf3f8eb5a6a57ce330b3d8a3495.
  ATTACHMENT ID: 12753909

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1839 checkstyle errors (more than the master's current 1838 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15401//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15401//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15401//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15401//console

This message is automatically generated.

> Metrics for block cache should take region replicas into account
> 
>
> Key: HBASE-14314
> URL: https://issues.apache.org/jira/browse/HBASE-14314
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, regionserver
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 14314-v1.txt, 14314-v2.txt, 14314-v3.txt, 14314-v4.txt
>
>
> Currently metrics for block cache are aggregates in the sense that they don't 
> distinguish primary from secondary / tertiary replicas.
> This JIRA separates the block cache metrics for primary region replica from 
> the aggregate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13153) enable bulkload to support replication

2015-09-03 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728576#comment-14728576
 ] 

Ashish Singhi commented on HBASE-13153:
---

bq. This seems a bit fragile. What if client crashes, etc ?
Client crashes (i.e. the source cluster RS goes down), znode will not be 
removed in this case. So the replication for this znode will be retried by the 
source cluster RS. So the same hfile will be loaded again in the peer cluster 
but that will not create any issue.

> enable bulkload to support replication
> --
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBase Bulk Load Replication.pdf
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment

2015-09-03 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728703#comment-14728703
 ] 

Elliott Clark commented on HBASE-6721:
--

That works great for me and would address the concerns I had.

> RegionServer Group based Assignment
> ---
>
> Key: HBASE-6721
> URL: https://issues.apache.org/jira/browse/HBASE-6721
> Project: HBase
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Francis Liu
>  Labels: hbase-6721
> Attachments: 6721-master-webUI.patch, HBASE-6721 
> GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, 
> HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, 
> HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
> HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
> HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
> HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, 
> HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, 
> HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, 
> HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, 
> immediateAssignments Sequence Diagram.svg, randomAssignment Sequence 
> Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment 
> Sequence Diagram.svg
>
>
> In multi-tenant deployments of HBase, it is likely that a RegionServer will 
> be serving out regions from a number of different tables owned by various 
> client applications. Being able to group a subset of running RegionServers 
> and assign specific tables to it, provides a client application a level of 
> isolation and resource allocation.
> The proposal essentially is to have an AssignmentManager which is aware of 
> RegionServer groups and assigns tables to region servers based on groupings. 
> Load balancing will occur on a per group basis as well. 
> This is essentially a simplification of the approach taken in HBASE-4120. See 
> attached document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14358) Parent region is not removed from regionstates after a successful split

2015-09-03 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-14358:
-

 Summary: Parent region is not removed from regionstates after a 
successful split
 Key: HBASE-14358
 URL: https://issues.apache.org/jira/browse/HBASE-14358
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.1, 1.0.3, 1.1.3
Reporter: Esteban Gutierrez
Priority: Critical


Ran into this while trying to find out why region_mover.rb was not catching an 
exception after a region was split. Digging further I found that the problem is 
happening in the handling of the region state in the Master since we don't 
remove the old state after the split is successful:

{code}
2015-09-03 02:56:49,255 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Ignored moving region not assigned: {ENCODED => 
9a4930ed41dc7013d9956240e6f5c03e, NAME => 
'u,user3605,1432797255754.9a4930ed41dc7013d9956240e6f5c03e.', STARTKEY => 
'user3605', ENDKEY => 'user3723'}, {9a4930ed41dc7013d9956240e6f5c03e 
state=SPLIT, ts=1441273152561, 
server=a2209.halxg.cloudera.com,22101,1441243232790}
{code}

I don't think the problem is happening in the master branch but at least I've 
been able to confirm this is happening on branch-1 and branch-1.2 at least.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13942) HBase client stalls during region split when client threads > hbase.hconnection.threads.max

2015-09-03 Thread Mukund Murrali (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728830#comment-14728830
 ] 

Mukund Murrali commented on HBASE-13942:


It is becoming increasingly difficult with having 256 threads/cluster. Is it 
not possible to reduce this? We are afraid the same issue might prop up in case 
of drastically reducing the threads. 

> HBase client stalls during region split when client threads > 
> hbase.hconnection.threads.max
> ---
>
> Key: HBASE-13942
> URL: https://issues.apache.org/jira/browse/HBASE-13942
> Project: HBase
>  Issue Type: Bug
>  Components: Client, regionserver
>Reporter: Mukund Murrali
>
> Performing any operataion using a single hconnection with client threads > 
> hbase.hconnection.threads.max causing the client to stall indefinetly during 
> first region split. All the hconnection threads in client side are waiting 
> with the following stack. 
> hconnection-0x648a83fd-shared--pool1-t8" daemon prio=10 
> tid=0x7f447c003800 nid=0x62ff waiting on condition [0x7f44c72f]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0007d768bdf0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at 
> java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
> at 
> org.apache.hadoop.hbase.util.BoundedCompletionService.take(BoundedCompletionService.java:74)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:174)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> at 
> org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:145)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.findAllLocationsOrFail(AsyncProcess.java:916)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:833)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1156)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveMultiAction(AsyncProcess.java:1296)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:574)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:716)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Yu Li (JIRA)
Yu Li created HBASE-14359:
-

 Summary: HTable#close will hang forever if unchecked 
error/exception thrown in AsyncProcess#sendMultiAction
 Key: HBASE-14359
 URL: https://issues.apache.org/jira/browse/HBASE-14359
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.2, 0.98.14
Reporter: Yu Li
Assignee: Victor Xu


Currently in AsyncProcess#sendMultiAction, we only catch the 
RejectedExecutionException and let other error/exception go, which will cause 
decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
to close the table in the finally clause, and HTable#close will call 
flushCommits and wait until all task done.

The problem is when unchecked error/exception like OutOfMemoryError thrown, 
taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
never return. Especially, if autoflush is set thus no data to flush during 
table close, there would be no rpc call so rpcTimeOut will not break the call, 
and thread will wait there forever.

In our product env, the unchecked error we observed is 
"java.lang.OutOfMemoryError: unable to create new native thread", and we 
observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Victor Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor Xu updated HBASE-14359:
--
Attachment: HBASE-14359-master-branch1-v1.patch
HBASE-14359-0.98-v1.patch

Submit patches for master, branch-1.X and 0.98.

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-master-branch1-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14359:
---
Status: Patch Available  (was: Open)

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.2, 0.98.14
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-master-branch1-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Victor Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728915#comment-14728915
 ] 

Victor Xu commented on HBASE-14359:
---

Add jstack output for this issue:
{noformat}
"Thread-14" daemon prio=10 tid=0x7fc9f7c0d800 nid=0x125b in Object.wait() 
[0x43acc000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hbase.client.AsyncProcess.waitForNextTaskDone(AsyncProcess.java:988)
- locked <0x000788126080> (a java.util.concurrent.atomic.AtomicLong)
at 
org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1014)
at 
org.apache.hadoop.hbase.client.AsyncProcess.waitUntilDone(AsyncProcess.java:1027)
at 
org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1092)
- locked <0x000788126168> (a java.lang.Object)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1424)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:1461)
at 
com.alibaba.search.offline.sync.common.hbase.HBaseTable.returnHTable(HBaseTable.java:136)
at 
com.alibaba.search.offline.sync.common.hbase.HBaseTable.batchPut(HBaseTable.java:185)
at 
com.alibaba.search.offline.sync.common.hbase.HBaseTable.batchPut(HBaseTable.java:159)
at 
com.alibaba.search.offline.sync.sync.storage.HBaseStorageHandler.multiPut(HBaseStorageHandler.java:80)
at 
com.alibaba.search.offline.sync.sync.LoaderProcessor.doEveryTable(LoaderProcessor.java:130)
at 
com.alibaba.search.offline.sync.sync.LoaderProcessor.execute(LoaderProcessor.java:50)
at 
com.alibaba.search.offline.sync.sync.ProcessorRunner.perform(ProcessorRunner.java:58)
at 
com.alibaba.search.offline.sync.sync.DaemonWorker.daemonWork(DaemonWorker.java:51)
- locked <0x0007809c4e48> (a 
com.alibaba.search.offline.sync.sync.ProcessorRunner)
at 
com.alibaba.search.offline.sync.sync.DaemonWorker.run(DaemonWorker.java:31)
{noformat}
All user processes hang in the while loop of AsyncProcess#waitForNextTaskDone.

> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-master-branch1-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Srikanth Srungarapu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Srungarapu updated HBASE-14261:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.3.0
   2.0.0
   Status: Resolved  (was: Patch Available)

Thanks folks for the reviews.  Pushed to 1.3+ branches.

> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729937#comment-14729937
 ] 

Hudson commented on HBASE-14344:


FAILURE: Integrated in HBase-TRUNK #6776 (See 
[https://builds.apache.org/job/HBase-TRUNK/6776/])
HBASE-14344 Add timeouts to TestHttpServerLifecycle (matteo.bertozzi: rev 
5152ac0e208fd5f720734fb2abf3fae07b39c7e2)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/http/TestHttpServerLifecycle.java


> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1472#comment-1472
 ] 

Hudson commented on HBASE-14344:


SUCCESS: Integrated in HBase-1.3-IT #130 (See 
[https://builds.apache.org/job/HBase-1.3-IT/130/])
HBASE-14344 Add timeouts to TestHttpServerLifecycle (matteo.bertozzi: rev 
8a4aee60820650576e0d0058d3613692c508209f)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/http/TestHttpServerLifecycle.java


> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729998#comment-14729998
 ] 

Hudson commented on HBASE-14261:


SUCCESS: Integrated in HBase-1.3-IT #130 (See 
[https://builds.apache.org/job/HBase-1.3-IT/130/])
HBASE-14261 Enhance Chaos Monkey framework by adding zookeeper and datanode 
fault injections. (ssrungarapu: rev 1717de65a49f0ae4885be29c888712010aaff506)
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/ServerAndDependenciesKillingMonkeyFactory.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/MonkeyFactory.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRandomDataNodeAction.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/HBaseClusterManager.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRandomZKNodeAction.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/RESTApiClusterManager.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/ClusterManager.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartActionBaseAction.java


> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14268) Improve KeyLocker

2015-09-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730040#comment-14730040
 ] 

Ted Yu commented on HBASE-14268:


Patch v7 looks good.

Have you run the KeyLocker performance program on patch v7 ?

> Improve KeyLocker
> -
>
> Key: HBASE-14268
> URL: https://issues.apache.org/jira/browse/HBASE-14268
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Attachments: 14268-V5.patch, HBASE-14268-V2.patch, 
> HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, 
> HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268.patch, KeyLockerPerformance.java
>
>
> 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a 
> synchronized block, which doesn't make sense. Moreover, logic inside the 
> synchronized block is not trivial so that it makes less performance in heavy 
> multi-threaded environment.
> 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already 
> locked, but it doesn't follow the contract of {{ReentrantLock}} because you 
> are not allowed to freely invoke lock/unlock methods under that contract. 
> That introduces a potential risk; Whenever you see a variable of the type 
> {{RentrantLock}}, you should pay attention to what the included instance is 
> coming from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2015-09-03 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730078#comment-14730078
 ] 

Nick Dimiduk commented on HBASE-6028:
-

I think having a way to interrupt active compactions would allow implementing a 
reasonable short-term solution to HBASE-11368. Any chance you have a patch 
[~esteban]?

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Esteban Gutierrez
>Priority: Minor
>  Labels: beginner
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-14272:
--
Attachment: HBASE-14272-v2.patch

Patch v2 with unit test. [~te...@apache.org], could you take a look?

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-14272:
--
Status: Patch Available  (was: Open)

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-14272:
--
Status: Open  (was: Patch Available)

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment

2015-09-03 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729969#comment-14729969
 ] 

Enis Soztutar commented on HBASE-6721:
--

Great. 
We can in theory define another Observer type (RSGroupObserver, etc) which will 
be the endpoint that coprocessors might implement for example for learning 
about RS groups operations. But it will be a coprocessor having its own 
co-processors which seems is not needed at the moment.


> RegionServer Group based Assignment
> ---
>
> Key: HBASE-6721
> URL: https://issues.apache.org/jira/browse/HBASE-6721
> Project: HBase
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Francis Liu
>  Labels: hbase-6721
> Attachments: 6721-master-webUI.patch, HBASE-6721 
> GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, 
> HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, 
> HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
> HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
> HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
> HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, 
> HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, 
> HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, 
> HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, 
> immediateAssignments Sequence Diagram.svg, randomAssignment Sequence 
> Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment 
> Sequence Diagram.svg
>
>
> In multi-tenant deployments of HBase, it is likely that a RegionServer will 
> be serving out regions from a number of different tables owned by various 
> client applications. Being able to group a subset of running RegionServers 
> and assign specific tables to it, provides a client application a level of 
> isolation and resource allocation.
> The proposal essentially is to have an AssignmentManager which is aware of 
> RegionServer groups and assigns tables to region servers based on groupings. 
> Load balancing will occur on a per group basis as well. 
> This is essentially a simplification of the approach taken in HBASE-4120. See 
> attached document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729974#comment-14729974
 ] 

Hudson commented on HBASE-14344:


FAILURE: Integrated in HBase-1.1 #648 (See 
[https://builds.apache.org/job/HBase-1.1/648/])
HBASE-14344 Add timeouts to TestHttpServerLifecycle (matteo.bertozzi: rev 
f0f6e075fd782607efdd58c90ceb131b0f729040)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/http/TestHttpServerLifecycle.java


> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2015-09-03 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730105#comment-14730105
 ] 

Nick Dimiduk commented on HBASE-11368:
--

Looks like HBASE-6028 wants to implement the meat of what I've proposed above. 
It also happens to be 2/3 of the work for HBASE-12446. Seems like good bang for 
the buck on this approach.

Chatting with [~enis] and [~devaraj] about this offline. Another idea is we can 
reduce the scope of when the read lock is held during compaction. In theory the 
compactor only needs a region read lock while deciding what files to compact 
and at the time of committing the compaction. We're protected from the case of 
region close events because compactions are checking (between every Cell!) if 
the store has been closed in order to abort in such a case. Is there another 
reason why we would want to hold the read lock for the entire duration of the 
compaction? [~stack] [~lhofhansl]?

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, 
> key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-03 Thread Dima Spivak (JIRA)
Dima Spivak created HBASE-14362:
---

 Summary: 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
duper flaky
 Key: HBASE-14362
 URL: https://issues.apache.org/jira/browse/HBASE-14362
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0
Reporter: Dima Spivak
Priority: Critical


[As seen in 
Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
 this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14215) Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient

2015-09-03 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729961#comment-14729961
 ] 

Enis Soztutar commented on HBASE-14215:
---

bq. Given that potential candidates are generated randomly, one would assume 
that "global optimum" will be attained with multiple candidate generations and 
there will be no "local optimum". No?
SLB works something like the https://en.wikipedia.org/wiki/Gradient_descent, 
except that we do not generate all the "candidates" that are the neighbors for 
the next assignment plan. We randomly generate a new candidate plan, and we 
always accept the candidate plan if it reduces the cost. This greedy search is 
thus vulnerable to local minimas. 
bq. As we included a new cost function for primary replication skew, will 
taking into account of primary replicas in the candidate generator (may be in 
RegionReplicaCandidateGenerator) can help keep 
hbase.master.balancer.stochastic.primaryRegionCountCost lower?
It might. RRCG has a code section which prefers to move the secondary region 
replica to move out, rather than a primary. Maybe that is causing more primary 
region count skew more. Do you want to try cutting it out, or trying with a 
changing the candidate generator? I can take on this if you want (there is a 
way that can simulate a cluster and assignment plans in a unit test env so that 
we can iterate quick). 

{code}
  // we have found the primary id for the region to move. Now find the 
actual regionIndex
  // with the given primary, prefer to move the secondary region.
  for (int j = 0; j < regionsPerGroup.length; j++) {
int regionIndex = regionsPerGroup[j];
if (selectedPrimaryIndex == regionIndexToPrimaryIndex[regionIndex]) {
  // always move the secondary, not the primary
  if (selectedPrimaryIndex != regionIndex) {
return regionIndex;
  }
}
  }
{code}


> Default cost used for PrimaryRegionCountSkewCostFunction is not sufficient 
> ---
>
> Key: HBASE-14215
> URL: https://issues.apache.org/jira/browse/HBASE-14215
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Reporter: Biju Nair
>Priority: Minor
> Attachments: 14215-v1.txt
>
>
> Current multiplier of 500 used in the stochastic balancer cost function 
> {{PrimaryRegionCountSkewCostFunction}} to calculate the cost of  total 
> primary replication skew doesn't seem to be sufficient to prevent the skews 
> (Refer HBASE-14110). We would want the default cost to be a higher value so 
> that skews in primary region replica has higher cost. The following is the 
> test result by setting the multiplier value to 1 (same as the region 
> replica rack cost multiplier) on a 3 Rack 9 RS node cluster which seems to 
> get the balancer distribute the primaries uniformly.
> *Initial Primary replica distribution - using the current multiplier* 
>  |r1n10|  102|
>  |r1n11|  85|
>  |r1n9|88|
>  |r2n10|  120|
>  |r2n11|  120|
>  |r2n9|   124|
>  |r3n10|  135|
>  |r3n11|  124|
>  |r3n9|129|
> *After long duration of read & writes - using current multiplier* 
> | r1n10|  102|
> | r1n11|  85|
> | r1n9|88|
> | r2n10|  120|
> | r2n11|  120|
> | r2n9 |   124|
> | r3n10|  135|
> | r3n11|  124|
> | r3n9|129|
> *After manual balancing*  
> | r1n10|  102|
> | r1n11|  85|
> | r1n9|88|
> | r2n10|  120|
> | r2n11|  120|
> | r2n9 |   124|
> | r3n10|  135|
> | r3n11|  124|
> | r3n9|129|
> *Increased multiplier for primaryRegionCountSkewCost to 1*
> | r1n10|  114|
> | r1n11 | 113|
> | r1n9 |   114|
> | r2n10|  114|
> | r2n11|  114|
> | r2n9 |   113|
> | r3n10|  115|
> | r3n11|  115|
> | r3n9 |   115 |
> Setting the {{PrimaryRegionCountSkewCostFunction}} multiplier value to 1 
> should help HBase general use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729971#comment-14729971
 ] 

Hudson commented on HBASE-14344:


FAILURE: Integrated in HBase-1.0 #1038 (See 
[https://builds.apache.org/job/HBase-1.0/1038/])
HBASE-14344 Add timeouts to TestHttpServerLifecycle (matteo.bertozzi: rev 
9b3e4bc89c36ca670ccfd6abb1aeaa85b16c70d9)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/http/TestHttpServerLifecycle.java


> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730018#comment-14730018
 ] 

Hudson commented on HBASE-14344:


FAILURE: Integrated in HBase-1.2 #149 (See 
[https://builds.apache.org/job/HBase-1.2/149/])
HBASE-14344 Add timeouts to TestHttpServerLifecycle (matteo.bertozzi: rev 
df17a6949e45de78bd6dd297f9457be12a4d7edd)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/http/TestHttpServerLifecycle.java


> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12911) Client-side metrics

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730048#comment-14730048
 ] 

stack commented on HBASE-12911:
---

I love this effort. 

Looking at the PNG, what is the "tag." prefix in client-side metrics.

What is batchPool*? If I hover over the metiric, there is a description.

Can we count threads in client metrics?

So, having trouble following the client side jmx bean hierarchy. When do the 
rpc metrics show up? Clients will list each remote server they connect too?





> Client-side metrics
> ---
>
> Key: HBASE-12911
> URL: https://issues.apache.org/jira/browse/HBASE-12911
> Project: HBase
>  Issue Type: Brainstorming
>  Components: Client, Performance, Usability
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 0001-HBASE-12911-Client-side-metrics.patch, 
> 0001-HBASE-12911-Client-side-metrics.patch, client metrics RS-Master.jpg, 
> client metrics client.jpg, connection attributes.jpg, ltt.jpg, standalone.jpg
>
>
> There's very little visibility into the hbase client. Folks who care to add 
> some kind of metrics collection end up wrapping Table method invocations with 
> {{System.currentTimeMillis()}}. For a crude example of this, have a look at 
> what I did in {{PerformanceEvaluation}} for exposing requests latencies up to 
> {{IntegrationTestRegionReplicaPerf}}. The client is quite complex, there's a 
> lot going on under the hood that is impossible to see right now without a 
> profiler. Being a crucial part of the performance of this distributed system, 
> we should have deeper visibility into the client's function.
> I'm not sure that wiring into the hadoop metrics system is the right choice 
> because the client is often embedded as a library in a user's application. We 
> should have integration with our metrics tools so that, i.e., a client 
> embedded in a coprocessor can report metrics through the usual RS channels, 
> or a client used in a MR job can do the same.
> I would propose an interface-based system with pluggable implementations. Out 
> of the box we'd include a hadoop-metrics implementation and one other, 
> possibly [dropwizard/metrics|https://github.com/dropwizard/metrics].
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730115#comment-14730115
 ] 

Ted Yu commented on HBASE-14272:


The test suite didn't run.

FYI

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730121#comment-14730121
 ] 

Vladimir Rodionov commented on HBASE-14272:
---

[~te...@apache.org], I know. Something failed in a build system. What am I 
supposed to do? Re-submit the patch?

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12911) Client-side metrics

2015-09-03 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730129#comment-14730129
 ] 

Nick Dimiduk commented on HBASE-12911:
--

Thanks a lot for taking a peek [~stack].

bq. Looking at the PNG, what is the "tag." prefix in client-side metrics.

I'm curious about that myself. Far as I can tell, that comes from some default 
in the MBean translation, they were there when I implemented a bean that didn't 
report anything at all. I guess it's helpful to know what host produced the 
metric in question?

bq. What is batchPool*? If I hover over the metiric, there is a description.

See ConnectionImplementation#getCurrentBatchPool(). Far as I can tell, it's the 
thread pool used by the connection to service DML work on behalf of all 
Connection consumers. For instance, it's the thread pool that's passed down 
into AsyncProcess unless the user specifies their own pool at Table creation.

bq. Can we count threads in client metrics?

We can inspect the batch pool, as I've started here. Pools passed into a Table 
instance (as I mention above) wouldn't be a part of that. Maybe we can query 
the JVM for all threads that call themselves "HBase" and expose that? I'm not 
sure what you have in mind with this one.

bq. So, having trouble following the client side jmx bean hierarchy. When do 
the rpc metrics show up? Clients will list each remote server they connect too?

Yeah, i'm not exactly sure how this plays out on a real cluster. This is from a 
simple standalone run. The goal is to have a bean for each host the client is 
sending an RPC to. From the ltt snap, "192.168.1.10-60917" is the single RS 
endpoint (that would be - if I had real DNS) and 
"192.168.1.10-60915" is the master RPC endpoint.

My open questions are around expiring old hosts when they go away, and about 
aggregating this host-level information at the connection level (or if that's 
even useful, given the drastic difference between our various RPC call 
durations and sizes). We could also explore other aggregations, like per region 
or per table, but that requires a bit more unpacking of the IPC layer than I've 
tackled just yet.

> Client-side metrics
> ---
>
> Key: HBASE-12911
> URL: https://issues.apache.org/jira/browse/HBASE-12911
> Project: HBase
>  Issue Type: Brainstorming
>  Components: Client, Performance, Usability
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 0001-HBASE-12911-Client-side-metrics.patch, 
> 0001-HBASE-12911-Client-side-metrics.patch, client metrics RS-Master.jpg, 
> client metrics client.jpg, connection attributes.jpg, ltt.jpg, standalone.jpg
>
>
> There's very little visibility into the hbase client. Folks who care to add 
> some kind of metrics collection end up wrapping Table method invocations with 
> {{System.currentTimeMillis()}}. For a crude example of this, have a look at 
> what I did in {{PerformanceEvaluation}} for exposing requests latencies up to 
> {{IntegrationTestRegionReplicaPerf}}. The client is quite complex, there's a 
> lot going on under the hood that is impossible to see right now without a 
> profiler. Being a crucial part of the performance of this distributed system, 
> we should have deeper visibility into the client's function.
> I'm not sure that wiring into the hadoop metrics system is the right choice 
> because the client is often embedded as a library in a user's application. We 
> should have integration with our metrics tools so that, i.e., a client 
> embedded in a coprocessor can report metrics through the usual RS channels, 
> or a client used in a MR job can do the same.
> I would propose an interface-based system with pluggable implementations. Out 
> of the box we'd include a hadoop-metrics implementation and one other, 
> possibly [dropwizard/metrics|https://github.com/dropwizard/metrics].
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729936#comment-14729936
 ] 

Hudson commented on HBASE-14261:


FAILURE: Integrated in HBase-TRUNK #6776 (See 
[https://builds.apache.org/job/HBase-TRUNK/6776/])
HBASE-14261 Enhance Chaos Monkey framework by adding zookeeper and datanode 
fault injections. (ssrungarapu: rev e48991970d3d584db8716ceaf9b186c46dce34b4)
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartActionBaseAction.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/ClusterManager.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/ServerAndDependenciesKillingMonkeyFactory.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/HBaseClusterManager.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRandomZKNodeAction.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/RESTApiClusterManager.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/MonkeyFactory.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRandomDataNodeAction.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java


> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729947#comment-14729947
 ] 

Hudson commented on HBASE-14261:


FAILURE: Integrated in HBase-1.3 #147 (See 
[https://builds.apache.org/job/HBase-1.3/147/])
HBASE-14261 Enhance Chaos Monkey framework by adding zookeeper and datanode 
fault injections. (ssrungarapu: rev 1717de65a49f0ae4885be29c888712010aaff506)
* hbase-it/src/test/java/org/apache/hadoop/hbase/DistributedHBaseCluster.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRandomZKNodeAction.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/Action.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/RESTApiClusterManager.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartActionBaseAction.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/actions/RestartRandomDataNodeAction.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/MonkeyFactory.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKServerTool.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/HBaseClusterManager.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
* hbase-it/src/test/java/org/apache/hadoop/hbase/ClusterManager.java
* 
hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/factories/ServerAndDependenciesKillingMonkeyFactory.java


> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729948#comment-14729948
 ] 

Hudson commented on HBASE-14344:


FAILURE: Integrated in HBase-1.3 #147 (See 
[https://builds.apache.org/job/HBase-1.3/147/])
HBASE-14344 Add timeouts to TestHttpServerLifecycle (matteo.bertozzi: rev 
8a4aee60820650576e0d0058d3613692c508209f)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/http/TestHttpServerLifecycle.java


> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14317:
--
Attachment: 14317v14.txt

Fix TestLogRolling. Failure was because an append was failing but the sync was 
allowed succeed.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v5.branch-1.2.txt, 
> 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Srikanth Srungarapu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729978#comment-14729978
 ] 

Srikanth Srungarapu commented on HBASE-14261:
-

Can I also backport this to other 1.x branches?

> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14344) Add timeouts to TestHttpServerLifecycle

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729985#comment-14729985
 ] 

Hudson commented on HBASE-14344:


SUCCESS: Integrated in HBase-1.2-IT #125 (See 
[https://builds.apache.org/job/HBase-1.2-IT/125/])
HBASE-14344 Add timeouts to TestHttpServerLifecycle (matteo.bertozzi: rev 
df17a6949e45de78bd6dd297f9457be12a4d7edd)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/http/TestHttpServerLifecycle.java


> Add timeouts to TestHttpServerLifecycle
> ---
>
> Key: HBASE-14344
> URL: https://issues.apache.org/jira/browse/HBASE-14344
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14344-v0.patch
>
>
> I got TestHttpServerLifecycle hanging a couple of times on my run. 
> simple patch to add a timeout to the tests, and avoid the build to hang. 
> (i haven't looked at them yet to see what was the source problem)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729984#comment-14729984
 ] 

Enis Soztutar commented on HBASE-14261:
---

We should at least put this to 1.2 branch. Since this is a testability 
improvement, this can go in to 1.x IMO. For 1.1 and 1.0, the only concerning 
change is the extra {{sudo -u}} in the command line for the SSH command. It 
might be fine, since most deployments might already be configuring it for their 
IT setup. 

> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14361) Investigate unused connection objects

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730031#comment-14730031
 ] 

stack commented on HBASE-14361:
---

Excellent justification for client-side metrics.

> Investigate unused connection objects
> -
>
> Key: HBASE-14361
> URL: https://issues.apache.org/jira/browse/HBASE-14361
> Project: HBase
>  Issue Type: Task
>  Components: Client
>Reporter: Nick Dimiduk
>
> Over on HBASE-12911 I have a patch that registers Connection instances with 
> the metrics system. In both standalone server and tll client applications, I 
> was surprised to see multiple connection objects showing up that are unused. 
> These are pretty heavy objects, including lots of client threads for the 
> batch pool. We should track these down and remove them -- if they're not some 
> kind of phantom artifacts of my WIP patch over there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730083#comment-14730083
 ] 

Hadoop QA commented on HBASE-14272:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754114/HBASE-14272-v2.patch
  against master branch at commit 5152ac0e208fd5f720734fb2abf3fae07b39c7e2.
  ATTACHMENT ID: 12754114

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 2.0.3) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn post-site goal 
to fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15409//testReport/
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15409//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15409//console

This message is automatically generated.

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-6617) ReplicationSourceManager should be able to track multiple WAL paths

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730084#comment-14730084
 ] 

Hadoop QA commented on HBASE-6617:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754061/HBASE-6617_v9.patch
  against master branch at commit 5152ac0e208fd5f720734fb2abf3fae07b39c7e2.
  ATTACHMENT ID: 12754061

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15407//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15407//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15407//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15407//console

This message is automatically generated.

> ReplicationSourceManager should be able to track multiple WAL paths
> ---
>
> Key: HBASE-6617
> URL: https://issues.apache.org/jira/browse/HBASE-6617
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Ted Yu
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-6617.patch, HBASE-6617_v2.patch, 
> HBASE-6617_v3.patch, HBASE-6617_v4.patch, HBASE-6617_v7.patch, 
> HBASE-6617_v9.patch
>
>
> Currently ReplicationSourceManager uses logRolled() to receive notification 
> about new HLog and remembers it in latestPath.
> When region server has multiple WAL support, we need to keep track of 
> multiple Path's in ReplicationSourceManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment

2015-09-03 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730127#comment-14730127
 ] 

Francis Liu commented on HBASE-6721:


[~enis] I can do that too and have security make use of it. Should be just as 
much effort as embedding security directly as I already have the RS group cp 
hooks in place as part of the current patch?

> RegionServer Group based Assignment
> ---
>
> Key: HBASE-6721
> URL: https://issues.apache.org/jira/browse/HBASE-6721
> Project: HBase
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Francis Liu
>  Labels: hbase-6721
> Attachments: 6721-master-webUI.patch, HBASE-6721 
> GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, 
> HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, 
> HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
> HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
> HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
> HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, 
> HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, 
> HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, 
> HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, 
> immediateAssignments Sequence Diagram.svg, randomAssignment Sequence 
> Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment 
> Sequence Diagram.svg
>
>
> In multi-tenant deployments of HBase, it is likely that a RegionServer will 
> be serving out regions from a number of different tables owned by various 
> client applications. Being able to group a subset of running RegionServers 
> and assign specific tables to it, provides a client application a level of 
> isolation and resource allocation.
> The proposal essentially is to have an AssignmentManager which is aware of 
> RegionServer groups and assigns tables to region servers based on groupings. 
> Load balancing will occur on a per group basis as well. 
> This is essentially a simplification of the approach taken in HBASE-4120. See 
> attached document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730275#comment-14730275
 ] 

Lars Hofhansl commented on HBASE-14272:
---

hbase.hstore.time.to.purge.deletes is an extra mechanism to delay removal of 
delete marker for a bit (mostly a hack to guard against bad timing of 
compactions at a replication slave when edit can arrive out of order from the 
master). It has nothing to do with KEEP_DELETED_CELLS.

KEEP_DELETED_CELLS makes the effect worse in this scenario. Not only the delete 
markers would be hanging around, but also the data they mark for delete.


> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14268) Improve KeyLocker

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730276#comment-14730276
 ] 

stack commented on HBASE-14268:
---

Nice numbers. Out of interest, do you see weak references being collected in 
non-full GC?

Why do this and not just Iterate the passed in Set?

Object[] keyArray = keys.toArray();

Why loop twice and not lock as you go: i.e. is this needed:

for (Lock lock : locks.values()) {
97lock.lock();
134 }   98  }

I came across this: 
https://svn.apache.org/repos/asf/santuario/xml-security-java/trunk/src/main/java/org/apache/xml/security/utils/WeakObjectPool.java
  Might have some ideas you could make use of .

Yeah, good to call purge frequently doing references (at least that is what 
I've seen commonly done -- our lrublockcache does simiilar).

Patch looks great.

There are other lock implementations in our code base. Woud be sweet if we 
could unify.



> Improve KeyLocker
> -
>
> Key: HBASE-14268
> URL: https://issues.apache.org/jira/browse/HBASE-14268
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 14268-V5.patch, HBASE-14268-V2.patch, 
> HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, 
> HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268.patch, KeyLockerPerformance.java
>
>
> 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a 
> synchronized block, which doesn't make sense. Moreover, logic inside the 
> synchronized block is not trivial so that it makes less performance in heavy 
> multi-threaded environment.
> 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already 
> locked, but it doesn't follow the contract of {{ReentrantLock}} because you 
> are not allowed to freely invoke lock/unlock methods under that contract. 
> That introduces a potential risk; Whenever you see a variable of the type 
> {{RentrantLock}}, you should pay attention to what the included instance is 
> coming from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14361) Investigate unused connection objects

2015-09-03 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730280#comment-14730280
 ] 

Elliott Clark commented on HBASE-14361:
---

I've seen some leaking connections too. For example The chaos monkey seems to 
create two connections. Though I can't see how.

> Investigate unused connection objects
> -
>
> Key: HBASE-14361
> URL: https://issues.apache.org/jira/browse/HBASE-14361
> Project: HBase
>  Issue Type: Task
>  Components: Client
>Reporter: Nick Dimiduk
>
> Over on HBASE-12911 I have a patch that registers Connection instances with 
> the metrics system. In both standalone server and tll client applications, I 
> was surprised to see multiple connection objects showing up that are unused. 
> These are pretty heavy objects, including lots of client threads for the 
> batch pool. We should track these down and remove them -- if they're not some 
> kind of phantom artifacts of my WIP patch over there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730283#comment-14730283
 ] 

stack commented on HBASE-11368:
---

[~ndimiduk] I like the idea of narrowing the lock scope but started to look and 
its a bit of a rats nest where locks are held (compactions checking on each row 
seems well dodgy... )  Yeah, a review of the attempt at undoing scanner locks 
so only a region-level lock sounds like it would help.
* 

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, 
> key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14363) Print more details on the row behind an Empty REGIONINFO_QUALIFIER warning

2015-09-03 Thread Harsh J (JIRA)
Harsh J created HBASE-14363:
---

 Summary: Print more details on the row behind an Empty 
REGIONINFO_QUALIFIER warning
 Key: HBASE-14363
 URL: https://issues.apache.org/jira/browse/HBASE-14363
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 1.0.0
Reporter: Harsh J
Priority: Trivial


Currently HBCK just prints a vague "Empty REGIONINFO_QUALIFIER found" warning, 
and does not print the row it found that on. While fixing this is easy thanks 
to HBCK, some more detail (say the row/region ID) would be good to print, to 
avoid people manually scanning meta to obtain the very same info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730287#comment-14730287
 ] 

Elliott Clark commented on HBASE-14317:
---

+1 still stands. The extra code clean ups are nice.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12988) [Replication]Parallel apply edits across regions

2015-09-03 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730288#comment-14730288
 ] 

Elliott Clark commented on HBASE-12988:
---

Should we get this in so that the 1.2 RC testing can get it ?
I can commit if you are comfortable with the patch as it stands.

> [Replication]Parallel apply edits across regions
> 
>
> Key: HBASE-12988
> URL: https://issues.apache.org/jira/browse/HBASE-12988
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: hongyu bi
>Assignee: Lars Hofhansl
> Attachments: 12988-v2.txt, 12988-v3.txt, 12988-v4.txt, 12988-v5.txt, 
> 12988.txt, HBASE-12988-0.98.patch, ParallelReplication-v2.txt
>
>
> we can apply  edits to slave cluster in parallel on table-level to speed up 
> replication .
> update : per conversation blow , it's better to apply edits on row-level in 
> parallel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14363) Print more details on the row behind an Empty REGIONINFO_QUALIFIER warning

2015-09-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HBASE-14363:

Issue Type: Improvement  (was: Bug)

> Print more details on the row behind an Empty REGIONINFO_QUALIFIER warning
> --
>
> Key: HBASE-14363
> URL: https://issues.apache.org/jira/browse/HBASE-14363
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Affects Versions: 1.0.0
>Reporter: Harsh J
>Priority: Trivial
>
> Currently HBCK just prints a vague "Empty REGIONINFO_QUALIFIER found" 
> warning, and does not print the row it found that on. While fixing this is 
> easy thanks to HBCK, some more detail (say the row/region ID) would be good 
> to print, to avoid people manually scanning meta to obtain the very same info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.

2015-09-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730290#comment-14730290
 ] 

stack commented on HBASE-14261:
---

[~srikanth235] Any chance of a release note with example of how to use this 
fancy new feature!

> Enhance Chaos Monkey framework by adding zookeeper and datanode fault 
> injections.
> -
>
> Key: HBASE-14261
> URL: https://issues.apache.org/jira/browse/HBASE-14261
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14261-branch-1.patch, 
> HBASE-14261-branch-1_v3.patch, HBASE-14261-branch-1_v4.patch, 
> HBASE-14261.branch-1_v2.patch, HBASE-14261.patch
>
>
> One of the shortcomings of existing ChaosMonkey framework is lack of fault 
> injections for hbase dependencies like zookeeper, hdfs etc. This patch 
> attempts to solve this problem partially by adding datanode and zk node fault 
> injections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730292#comment-14730292
 ] 

Hudson commented on HBASE-14359:


FAILURE: Integrated in HBase-TRUNK #6777 (See 
[https://builds.apache.org/job/HBase-TRUNK/6777/])
HBASE-14359 HTable#close will hang forever if unchecked error/exception thrown 
in AsyncProcess#sendMultiAction (Victor Xu) (apurtell: rev 
2481b7f76fa7e4f2b120f8dc96004790b357e569)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java


> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730293#comment-14730293
 ] 

Hudson commented on HBASE-14359:


FAILURE: Integrated in HBase-1.2 #150 (See 
[https://builds.apache.org/job/HBase-1.2/150/])
HBASE-14359 HTable#close will hang forever if unchecked error/exception thrown 
in AsyncProcess#sendMultiAction (Victor Xu) (apurtell: rev 
1e411c8ead80067bc4c57c4d357b05e161d29eed)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java


> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14364) hlog_roll and compact_rs broken in shell

2015-09-03 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14364:
-

 Summary: hlog_roll and compact_rs broken in shell
 Key: HBASE-14364
 URL: https://issues.apache.org/jira/browse/HBASE-14364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.14
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


Just noticed that both hlog_roll and compact_rs are broken in shell (at least 
in 0.98).

The hlog_roll broken 3 times: (1) calls admin.rollWALWriter, which no longer 
exists, and (2) tries to pass a ServerName, but method takes a string, and (3) 
uses unqualified ServerName to get a server name, which leads to an  
uninitialized constant error.

compact_rs only has the latter problem.
Patch upcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730301#comment-14730301
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754134/14317v15.txt
  against master branch at commit 2481b7f76fa7e4f2b120f8dc96004790b357e569.
  ATTACHMENT ID: 12754134

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15410//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730310#comment-14730310
 ] 

Hudson commented on HBASE-14359:


SUCCESS: Integrated in HBase-1.1 #649 (See 
[https://builds.apache.org/job/HBase-1.1/649/])
HBASE-14359 HTable#close will hang forever if unchecked error/exception thrown 
in AsyncProcess#sendMultiAction (Victor Xu) (apurtell: rev 
beaecd11398f465c05d742b49e496133179dbb09)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java


> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14364) hlog_roll and compact_rs broken in shell

2015-09-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14364:
--
Attachment: 14364-0.98.txt

Simple fix.

no time right now to check the other branches

> hlog_roll and compact_rs broken in shell
> 
>
> Key: HBASE-14364
> URL: https://issues.apache.org/jira/browse/HBASE-14364
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14364-0.98.txt
>
>
> Just noticed that both hlog_roll and compact_rs are broken in shell (at least 
> in 0.98).
> The hlog_roll broken 3 times: (1) calls admin.rollWALWriter, which no longer 
> exists, and (2) tries to pass a ServerName, but method takes a string, and 
> (3) uses unqualified ServerName to get a server name, which leads to an  
> uninitialized constant error.
> compact_rs only has the latter problem.
> Patch upcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12298) Support BB usage in PrefixTree

2015-09-03 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12298:
---
Status: Open  (was: Patch Available)

> Support BB usage in PrefixTree
> --
>
> Key: HBASE-12298
> URL: https://issues.apache.org/jira/browse/HBASE-12298
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-12298.patch, HBASE-12298_1.patch, 
> HBASE-12298_2.patch, HBASE-12298_3.patch, HBASE-12298_4.patch, 
> HBASE-12298_4.patch, HBASE-12298_4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12298) Support BB usage in PrefixTree

2015-09-03 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12298:
---
Attachment: HBASE-12298_4.patch

Retry QA. Any more reviews here?  

> Support BB usage in PrefixTree
> --
>
> Key: HBASE-12298
> URL: https://issues.apache.org/jira/browse/HBASE-12298
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-12298.patch, HBASE-12298_1.patch, 
> HBASE-12298_2.patch, HBASE-12298_3.patch, HBASE-12298_4.patch, 
> HBASE-12298_4.patch, HBASE-12298_4.patch, HBASE-12298_4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12298) Support BB usage in PrefixTree

2015-09-03 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12298:
---
Status: Patch Available  (was: Open)

> Support BB usage in PrefixTree
> --
>
> Key: HBASE-12298
> URL: https://issues.apache.org/jira/browse/HBASE-12298
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-12298.patch, HBASE-12298_1.patch, 
> HBASE-12298_2.patch, HBASE-12298_3.patch, HBASE-12298_4.patch, 
> HBASE-12298_4.patch, HBASE-12298_4.patch, HBASE-12298_4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730320#comment-14730320
 ] 

Hudson commented on HBASE-14359:


SUCCESS: Integrated in HBase-1.2-IT #126 (See 
[https://builds.apache.org/job/HBase-1.2-IT/126/])
HBASE-14359 HTable#close will hang forever if unchecked error/exception thrown 
in AsyncProcess#sendMultiAction (Victor Xu) (apurtell: rev 
1e411c8ead80067bc4c57c4d357b05e161d29eed)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java


> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730327#comment-14730327
 ] 

Hudson commented on HBASE-14359:


FAILURE: Integrated in HBase-1.3 #148 (See 
[https://builds.apache.org/job/HBase-1.3/148/])
HBASE-14359 HTable#close will hang forever if unchecked error/exception thrown 
in AsyncProcess#sendMultiAction (Victor Xu) (apurtell: rev 
513b37603dec42e54a7ea4add0254c71ba6c87b6)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java


> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730142#comment-14730142
 ] 

Ted Yu commented on HBASE-14272:


{code}
+  // Keep Store config for future use durin compactions
{code}
Typo: durin
{code}
+  if (blockLocalityIndex < comConf.getMinLocalityToForceCompact() || 
purgeDeletes) {
{code}
Moving purgeDeletes ahead of locality check would make the check more efficient 
- in case purgeDeletes is true.
{code}
+  "); keep deleted cells="+keepDeletedCells+"; 
purgeDeletes="+purgeDeletes);
{code}
purgeDeletes would be false in the if branch. It seems there is no need to 
include it in the log.
{code}
+  private long getTimeToPurgeDeletesForStore(Configuration conf)
+  {
+return conf.getLong(HStore.TIME_TO_PURGE_DELETES_KEY, 0);
{code}
The above method can be folded into the caller.

With this change, the store file would be compacted after every 
timeToPurgeDeletes interval. Is that intended ?

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2015-09-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730140#comment-14730140
 ] 

Lars Hofhansl edited comment on HBASE-11368 at 9/4/15 1:23 AM:
---

See also discussion in HBASE-13082. (somewhat related, but talks about the 
locks in StoreScanner that we need to safely reset the scanner stack)


was (Author: lhofhansl):
See also discussion in HBASE-13082.

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, 
> key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14268) Improve KeyLocker

2015-09-03 Thread Hiroshi Ikeda (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730145#comment-14730145
 ] 

Hiroshi Ikeda commented on HBASE-14268:
---

I measured it just now in my environment, which has 8 cpu core, and the result 
is as followed. For each KeyLocker I run the performance application 5 times 
and I collected the output number which represents the consumed time 
(nanoseconds).

old KeyLocker
(1) 4168557944
(2) 4173845279
(3) 4276035366
(4) 4344219315
(5) 4393414763
average
= 4271214533.4 (nanos)
~ 4.27 (sec)

new KeyLocker
(1) 270832002
(2) 318058811
(3) 278171946
(4) 265603446
(5) 279867215
average
= 282506684 (nanos)
~ 0.28 (sec)

The difference just comes from overhead of context switches when 
getting/returning locks from/to the pool, and this may be not so meaningful for 
heavy tasks which cause context switches.



> Improve KeyLocker
> -
>
> Key: HBASE-14268
> URL: https://issues.apache.org/jira/browse/HBASE-14268
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Attachments: 14268-V5.patch, HBASE-14268-V2.patch, 
> HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, 
> HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268.patch, KeyLockerPerformance.java
>
>
> 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a 
> synchronized block, which doesn't make sense. Moreover, logic inside the 
> synchronized block is not trivial so that it makes less performance in heavy 
> multi-threaded environment.
> 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already 
> locked, but it doesn't follow the contract of {{ReentrantLock}} because you 
> are not allowed to freely invoke lock/unlock methods under that contract. 
> That introduces a potential risk; Whenever you see a variable of the type 
> {{RentrantLock}}, you should pay attention to what the included instance is 
> coming from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14268) Improve KeyLocker

2015-09-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-14268:
---
 Hadoop Flags: Reviewed
Fix Version/s: 1.3.0
   2.0.0

Nice performance gains.

Planning to integrate sometime tomorrow if there is no other review comment.

> Improve KeyLocker
> -
>
> Key: HBASE-14268
> URL: https://issues.apache.org/jira/browse/HBASE-14268
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Reporter: Hiroshi Ikeda
>Assignee: Hiroshi Ikeda
>Priority: Minor
> Fix For: 2.0.0, 1.3.0
>
> Attachments: 14268-V5.patch, HBASE-14268-V2.patch, 
> HBASE-14268-V3.patch, HBASE-14268-V4.patch, HBASE-14268-V5.patch, 
> HBASE-14268-V5.patch, HBASE-14268-V6.patch, HBASE-14268-V7.patch, 
> HBASE-14268-V7.patch, HBASE-14268.patch, KeyLockerPerformance.java
>
>
> 1. In the implementation of {{KeyLocker}} it uses atomic variables inside a 
> synchronized block, which doesn't make sense. Moreover, logic inside the 
> synchronized block is not trivial so that it makes less performance in heavy 
> multi-threaded environment.
> 2. {{KeyLocker}} gives an instance of {{RentrantLock}} which is already 
> locked, but it doesn't follow the contract of {{ReentrantLock}} because you 
> are not allowed to freely invoke lock/unlock methods under that contract. 
> That introduces a potential risk; Whenever you see a variable of the type 
> {{RentrantLock}}, you should pay attention to what the included instance is 
> coming from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14272) Enforce major compaction on stores with KEEP_DELETED_CELLS=true

2015-09-03 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730176#comment-14730176
 ] 

Vladimir Rodionov commented on HBASE-14272:
---

[~te...@apache.org]:
{quote}
With this change, the store file would be compacted after every 
timeToPurgeDeletes interval. Is that intended ?
{quote}

No, only when major compaction interval has elapsed.

> Enforce major compaction on stores with KEEP_DELETED_CELLS=true
> ---
>
> Key: HBASE-14272
> URL: https://issues.apache.org/jira/browse/HBASE-14272
> Project: HBase
>  Issue Type: Bug
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14272-v2.patch, HBASE-14272.patch
>
>
> Currently, if store has one (major compacted) file, the only case when major 
> compaction will be triggered for this file again - when locality is below 
> threshold, defined by *hbase.hstore.min.locality.to.skip.major.compact* or 
> TTL expired some cells. If file has locality greater than this threshold it 
> will never be major compacted until Store's TTL kicks in. For CF with 
> KEEP_DELETED_CELLS on, compaction must be enabled always (even for single 
> file), regardless of locality, when deleted cells are expired 
> (*hbase.hstore.time.to.purge.deletes*)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment

2015-09-03 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730209#comment-14730209
 ] 

Enis Soztutar commented on HBASE-6721:
--

Agreed, that let's keep it simple for now unless needed.

> RegionServer Group based Assignment
> ---
>
> Key: HBASE-6721
> URL: https://issues.apache.org/jira/browse/HBASE-6721
> Project: HBase
>  Issue Type: New Feature
>Reporter: Francis Liu
>Assignee: Francis Liu
>  Labels: hbase-6721
> Attachments: 6721-master-webUI.patch, HBASE-6721 
> GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
> HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, 
> HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, 
> HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
> HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
> HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
> HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, 
> HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, 
> HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, 
> HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, 
> immediateAssignments Sequence Diagram.svg, randomAssignment Sequence 
> Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment 
> Sequence Diagram.svg
>
>
> In multi-tenant deployments of HBase, it is likely that a RegionServer will 
> be serving out regions from a number of different tables owned by various 
> client applications. Being able to group a subset of running RegionServers 
> and assign specific tables to it, provides a client application a level of 
> isolation and resource allocation.
> The proposal essentially is to have an AssignmentManager which is aware of 
> RegionServer groups and assigns tables to region servers based on groupings. 
> Load balancing will occur on a per group basis as well. 
> This is essentially a simplification of the approach taken in HBASE-4120. See 
> attached document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730227#comment-14730227
 ] 

Hadoop QA commented on HBASE-14317:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12754104/14317v14.txt
  against master branch at commit 5152ac0e208fd5f720734fb2abf3fae07b39c7e2.
  ATTACHMENT ID: 12754104

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1839 checkstyle errors (more than the master's current 1838 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15408//console

This message is automatically generated.

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v5.branch-1.2.txt, 
> 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, HBASE-14317-v2.patch, 
> HBASE-14317-v3.patch, HBASE-14317-v4.patch, HBASE-14317.patch, [Java] RS 
> stuck on WAL sync to a dead DN - Pastebin.com.html, append-only-test.patch, 
> raw.php, repro.txt, san_dump.txt, subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14317) Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL

2015-09-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14317:
--
Attachment: 14317v15.txt

Fix checkstyles

> Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL
> -
>
> Key: HBASE-14317
> URL: https://issues.apache.org/jira/browse/HBASE-14317
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.1
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
> Attachments: 14317.test.txt, 14317v10.txt, 14317v11.txt, 
> 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, 
> 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, 
> HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, 
> HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - 
> Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, 
> subset.of.rs.log
>
>
> hbase-1.1.1 and hadoop-2.7.1
> We try to roll logs because can't append (See HDFS-8960) but we get stuck. 
> See attached thread dump and associated log. What is interesting is that 
> syncers are waiting to take syncs to run and at same time we want to flush so 
> we are waiting on a safe point but there seems to be nothing in our ring 
> buffer; did we go to roll log and not add safe point sync to clear out 
> ringbuffer?
> Needs a bit of study. Try to reproduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14359) HTable#close will hang forever if unchecked error/exception thrown in AsyncProcess#sendMultiAction

2015-09-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730256#comment-14730256
 ] 

Hudson commented on HBASE-14359:


FAILURE: Integrated in HBase-1.0 #1039 (See 
[https://builds.apache.org/job/HBase-1.0/1039/])
HBASE-14359 HTable#close will hang forever if unchecked error/exception thrown 
in AsyncProcess#sendMultiAction (Victor Xu) (apurtell: rev 
0564bfc81a7b035eb49a3d016d094ce9ab2ebb90)
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java


> HTable#close will hang forever if unchecked error/exception thrown in 
> AsyncProcess#sendMultiAction
> --
>
> Key: HBASE-14359
> URL: https://issues.apache.org/jira/browse/HBASE-14359
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Victor Xu
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: HBASE-14359-0.98-v1.patch, 
> HBASE-14359-branch-1-v1.patch, HBASE-14359-master-branch1-v1.patch, 
> HBASE-14359-master-v1.patch
>
>
> Currently in AsyncProcess#sendMultiAction, we only catch the 
> RejectedExecutionException and let other error/exception go, which will cause 
> decTaskCounter not invoked. Meanwhile, the recommendation for using HTable is 
> to close the table in the finally clause, and HTable#close will call 
> flushCommits and wait until all task done.
> The problem is when unchecked error/exception like OutOfMemoryError thrown, 
> taskSent will never be equal to taskDone, so AsyncProcess#waitUntilDone will 
> never return. Especially, if autoflush is set thus no data to flush during 
> table close, there would be no rpc call so rpcTimeOut will not break the 
> call, and thread will wait there forever.
> In our product env, the unchecked error we observed is 
> "java.lang.OutOfMemoryError: unable to create new native thread", and we 
> observed the client thread hang for hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >