date:20150706


[ 
https://issues.apache.org/jira/browse/HBASE-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614713#comment-14614713
 ] 

Hudson commented on HBASE-14020:


SUCCESS: Integrated in HBase-TRUNK #6629 (See 
[https://builds.apache.org/job/HBase-TRUNK/6629/])
HBASE-14020 Unsafe based optimized write in ByteBufferOutputStream. 
(anoopsamjohn: rev 7d3456d8fd027e252b1da7578e943f146626135d)
* hbase-common/src/main/java/org/apache/hadoop/hbase/util/UnsafeAccess.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
* 
hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferOutputStream.java


 Unsafe based optimized write in ByteBufferOutputStream
 --

 Key: HBASE-14020
 URL: https://issues.apache.org/jira/browse/HBASE-14020
 Project: HBase
  Issue Type: Sub-task
  Components: Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: HBASE-14020.patch, HBASE-14020_v2.patch, 
 HBASE-14020_v3.patch, benchmark.zip


 We use this class to build the cellblock at RPC layer. The write operation is 
 doing puts to java ByteBuffer which is having lot of overhead. Instead we can 
 do Unsafe based copy to buffer operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12298) Support BB usage in PrefixTree


[ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614782#comment-14614782
 ] 

Hadoop QA commented on HBASE-12298:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743678/HBASE-12298.patch
  against master branch at commit 7d3456d8fd027e252b1da7578e943f146626135d.
  ATTACHMENT ID: 12743678

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 17 javac compiler 
warnings (more than the master's current 16 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1901 checkstyle errors (more than the master's current 1898 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14671//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14671//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14671//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14671//console

This message is automatically generated.

 Support BB usage in PrefixTree
 --

 Key: HBASE-12298
 URL: https://issues.apache.org/jira/browse/HBASE-12298
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12298.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


[ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614809#comment-14614809
 ] 

Anoop Sam John commented on HBASE-12213:


Pls add the patch in RB

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch, 
 HBASE-12213_4.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


 [ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12213:
---
Status: Patch Available  (was: Open)

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch, 
 HBASE-12213_4.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


 [ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12213:
---
Attachment: HBASE-12213_4.patch

Updated patch correcting the javadoc and checkstyle comments.

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch, 
 HBASE-12213_4.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


 [ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12213:
---
Status: Open  (was: Patch Available)

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12298) Support BB usage in PrefixTree


 [ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12298:
---
Status: Open  (was: Patch Available)

 Support BB usage in PrefixTree
 --

 Key: HBASE-12298
 URL: https://issues.apache.org/jira/browse/HBASE-12298
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12298.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12298) Support BB usage in PrefixTree


 [ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12298:
---
Attachment: HBASE-12298_1.patch

Updated patch removing the unused imports.

 Support BB usage in PrefixTree
 --

 Key: HBASE-12298
 URL: https://issues.apache.org/jira/browse/HBASE-12298
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12298.patch, HBASE-12298_1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12298) Support BB usage in PrefixTree


 [ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12298:
---
Status: Patch Available  (was: Open)

 Support BB usage in PrefixTree
 --

 Key: HBASE-12298
 URL: https://issues.apache.org/jira/browse/HBASE-12298
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12298.patch, HBASE-12298_1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-8642) [Snapshot] List and delete snapshot by table

2015-07-06 Thread Ashish Singhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-8642:
-
Attachment: HBASE-8642-0.98.patch

Patch for 0.98 branch.

 [Snapshot] List and delete snapshot by table
 

 Key: HBASE-8642
 URL: https://issues.apache.org/jira/browse/HBASE-8642
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2
Reporter: Julian Zhou
Assignee: Ashish Singhi
 Fix For: 2.0.0

 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 
 8642-trunk-0.95-v2.patch, HBASE-8642-0.98.patch, HBASE-8642-v1.patch, 
 HBASE-8642-v2.patch, HBASE-8642-v3.patch, HBASE-8642-v4.patch, 
 HBASE-8642.patch


 Support list and delete snapshots by table names.
 User scenario:
 A user wants to delete all the snapshots which were taken in January month 
 for a table 't' where snapshot names starts with 'Jan'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13387) Add ByteBufferedCell an extension to Cell


 [ 
https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-13387:
---
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

 Add ByteBufferedCell an extension to Cell
 -

 Key: HBASE-13387
 URL: https://issues.apache.org/jira/browse/HBASE-13387
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: ByteBufferedCell.docx, HBASE-13387_v1.patch, 
 WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch, benchmark.zip


 This came in btw the discussion abt the parent Jira and recently Stack added 
 as a comment on the E2E patch on the parent Jira.
 The idea is to add a new Interface 'ByteBufferedCell'  in which we can add 
 new buffer based getter APIs and getters for position in components in BB.  
 We will keep this interface @InterfaceAudience.Private.   When the Cell is 
 backed by a DBB, we can create an Object implementing this new interface.
 The Comparators has to be aware abt this new Cell extension and has to use 
 the BB based APIs rather than getXXXArray().  Also give util APIs in CellUtil 
 to abstract the checks for new Cell type.  (Like matchingXXX APIs, 
 getValueAstype APIs etc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13387) Add ByteBufferedCell an extension to Cell

[
https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anoop Sam John updated HBASE-13387:
---
Attachment: HBASE-13387_v1.patch

Added the ByteBufferedCell , an extension to Cell at server side
Added the instance based check annd usage of proper APIs in CellComparator and
CellUtil..
Refactored some other core code to make use of CellUtil/ CellComparator APIs.
The instance based checks are limited to these 2 classes

Still some other parts of code using getXXXArray() API with out any checks.
Will correct them with follow on tasks. The areas are mainly
1. Filters
2. CPs
3. Tag area as we have byte[] backed tag impl alone
4. DBE area

Add ByteBufferedCell an extension to Cell
-

Key: HBASE-13387
URL: https://issues.apache.org/jira/browse/HBASE-13387
Project: HBase
Issue Type: Sub-task
Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Fix For: 2.0.0

Attachments: ByteBufferedCell.docx, HBASE-13387_v1.patch,
WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch, benchmark.zip

This came in btw the discussion abt the parent Jira and recently Stack added
as a comment on the E2E patch on the parent Jira.
The idea is to add a new Interface 'ByteBufferedCell' in which we can add
new buffer based getter APIs and getters for position in components in BB.
We will keep this interface @InterfaceAudience.Private. When the Cell is
backed by a DBB, we can create an Object implementing this new interface.
The Comparators has to be aware abt this new Cell extension and has to use
the BB based APIs rather than getXXXArray(). Also give util APIs in CellUtil
to abstract the checks for new Cell type. (Like matchingXXX APIs,
getValueAstype APIs etc)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-8642) [Snapshot] List and delete snapshot by table

2015-07-06 Thread Ashish Singhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-8642:
-
Fix Version/s: 1.3.0
   0.98.14

 [Snapshot] List and delete snapshot by table
 

 Key: HBASE-8642
 URL: https://issues.apache.org/jira/browse/HBASE-8642
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2
Reporter: Julian Zhou
Assignee: Ashish Singhi
 Fix For: 2.0.0, 0.98.14, 1.3.0

 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 
 8642-trunk-0.95-v2.patch, HBASE-8642-0.98.patch, HBASE-8642-v1.patch, 
 HBASE-8642-v2.patch, HBASE-8642-v3.patch, HBASE-8642-v4.patch, 
 HBASE-8642.patch


 Support list and delete snapshots by table names.
 User scenario:
 A user wants to delete all the snapshots which were taken in January month 
 for a table 't' where snapshot names starts with 'Jan'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13387) Add ByteBufferedCell an extension to Cell


[ 
https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614937#comment-14614937
 ] 

Hadoop QA commented on HBASE-13387:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743694/HBASE-13387_v1.patch
  against master branch at commit 7d3456d8fd027e252b1da7578e943f146626135d.
  ATTACHMENT ID: 12743694

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1899 checkstyle errors (more than the master's current 1898 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14672//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14672//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14672//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14672//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14672//console

This message is automatically generated.

 Add ByteBufferedCell an extension to Cell
 -

 Key: HBASE-13387
 URL: https://issues.apache.org/jira/browse/HBASE-13387
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0

 Attachments: ByteBufferedCell.docx, HBASE-13387_v1.patch, 
 WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch, benchmark.zip


 This came in btw the discussion abt the parent Jira and recently Stack added 
 as a comment on the E2E patch on the parent Jira.
 The idea is to add a new Interface 'ByteBufferedCell'  in which we can add 
 new buffer based getter APIs and getters for position in components in BB.  
 We will keep this interface @InterfaceAudience.Private.   When the Cell is 
 backed by a DBB, we can create an Object implementing this new interface.
 The Comparators has to be aware abt this new Cell extension and has to use 
 the BB based APIs rather than getXXXArray().  Also give util APIs in CellUtil 
 to abstract the checks for new Cell type.  (Like matchingXXX APIs, 
 getValueAstype APIs etc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12298) Support BB usage in PrefixTree


[ 
https://issues.apache.org/jira/browse/HBASE-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614959#comment-14614959
 ] 

Hadoop QA commented on HBASE-12298:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743696/HBASE-12298_1.patch
  against master branch at commit 7d3456d8fd027e252b1da7578e943f146626135d.
  ATTACHMENT ID: 12743696

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 17 javac compiler 
warnings (more than the master's current 16 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14674//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14674//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14674//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14674//console

This message is automatically generated.

 Support BB usage in PrefixTree
 --

 Key: HBASE-12298
 URL: https://issues.apache.org/jira/browse/HBASE-12298
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12298.patch, HBASE-12298_1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.


[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614971#comment-14614971
 ] 

ramkrishna.s.vasudevan commented on HBASE-13337:


+1 on it. Some one needs to take a look at this? 

 Table regions are not assigning back, after restarting all regionservers at 
 once.
 -

 Key: HBASE-13337
 URL: https://issues.apache.org/jira/browse/HBASE-13337
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 2.0.0
Reporter: Y. SREENIVASULU REDDY
Assignee: Samir Ahmic
Priority: Blocker
 Fix For: 2.0.0

 Attachments: HBASE-13337-v2.patch, HBASE-13337-v3.patch, 
 HBASE-13337.patch


 Regions of the table are continouly in state=FAILED_CLOSE.
 {noformat}
 RegionState   
   
   RIT time (ms)
 8f62e819b356736053e06240f7f7c6fd  
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM1,16040,1427362531818  113929
 caf59209ae65ea80fca6bdc6996a7d68  
 t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM2,16040,1427362533691  113929
 db52a74988f71e5cf257bbabf31f26f3  
 t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM3,16040,1427362533691  113920
 43f3a65b9f9ff283f598c5450feab1f8  
 t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM1,16040,1427362531818  113920
 {noformat}
 *Steps to reproduce:*
 1. Start HBase cluster with more than one regionserver.
 2. Create a table with precreated regions. (lets say 15 regions)
 3. Make sure the regions are well balanced.
 4. Restart all the Regionservers process at once across the cluster, except 
 HMaster process
 5. After restarting the Regionservers, successfully will connect to the 
 HMaster.
 *Bug:*
 But no regions are assigning back to the Regionservers.
 *Master log shows as follows:*
 {noformat}
 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
 state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
 {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
 server=VM1,16040,1427362531818}
 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStateStore: Updating row 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
 state=PENDING_OPENsn=VM1,16040,1427362531818
 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Force region state offline 
 {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
 server=VM1,16040,1427362531818}
 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
 state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
 {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
 server=VM1,16040,1427362531818}
 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStateStore: Updating row 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
 state=PENDING_CLOSE
 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10

[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


[ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614969#comment-14614969
 ] 

Hadoop QA commented on HBASE-12213:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743695/HBASE-12213_4.patch
  against master branch at commit 7d3456d8fd027e252b1da7578e943f146626135d.
  ATTACHMENT ID: 12743695

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 51 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 20 javac compiler 
warnings (more than the master's current 16 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14673//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14673//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14673//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14673//console

This message is automatically generated.

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch, 
 HBASE-12213_4.patch


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12848) Utilize Flash storage for WAL

[
https://issues.apache.org/jira/browse/HBASE-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614983#comment-14614983
]

Anoop Sam John commented on HBASE-12848:

We will be doing the archiving of the WAL files later. Ideally no need to keep
the archived WALs in flash storage. That will be wastage of the resource. The
archive op is done by file rename. Had some discussion with [~tedyu] and
[~jingcheng...@intel.com]. HDFS wont move the file content across different
block volumes on rename. The Mover tool only can do this which has to be
invoked explicitly. We have to solve this some way? Later on if we try to
place some HFiles also (depending on some stats) to SSDs, this might become
more critical.

Utilize Flash storage for WAL
-

Key: HBASE-12848
URL: https://issues.apache.org/jira/browse/HBASE-12848
Project: HBase
Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Ted Yu
Fix For: 2.0.0, 1.1.0

Attachments: 12848-v1.patch, 12848-v2.patch, 12848-v3.patch,
12848-v4.patch, 12848-v4.patch

One way to improve data ingestion rate is to make use of Flash storage.
HDFS is doing the heavy lifting - see HDFS-7228.
We assume an environment where:
1. Some servers have a mix of flash, e.g. 2 flash drives and 4 traditional
drives.
2. Some servers have all traditional storage.
3. RegionServers are deployed on both profiles within one HBase cluster.
This JIRA allows WAL to be managed on flash in a mixed-profile environment.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-13792) Regionserver unable to report to master when master is restarted


 [ 
https://issues.apache.org/jira/browse/HBASE-13792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-13792.

Resolution: Duplicate
  Assignee: (was: Samir Ahmic)

Closing as dup of HBASE-13337.

 Regionserver unable to report to master when master is restarted
 

 Key: HBASE-13792
 URL: https://issues.apache.org/jira/browse/HBASE-13792
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 2.0.0
 Environment: x86_64 GNU/Linux
Reporter: Samir Ahmic
Priority: Critical
 Fix For: 2.0.0


 I was testing master branch on distributed cluster and i notice that when 
 master is restarted  on running cluster regionservers are unable report back 
 when master is up again. 
 Things back to normal after i restarted regionservers. Logs showing that 
 regionservers are correctly detecting master znode.  
 After some digging i notice that we have changed client implementation in 
 RpcClientFactory to  AsyncRpcClient so i have tried running cluster with 
 previous  RpcClientImpl and issue was gone. 
 So issue is probably caused by AsyncRpcClient which is unable reconnect to 
 master once original connection is gone.
 I was able to fix issue by creating new rpcClient object inside 
 HRegionServer#createRegionServerStatusStub() and using it for channel 
 creation here is diff:
 {code}
 diff --git 
 a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  
 b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 index fa56966..27e658c 100644
 --- 
 a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 +++ 
 b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 @@ -2219,8 +2219,11 @@ public class HRegionServer extends HasThread implements
break;
  }
  try {
 +  LOG.info(***Creating new client connection);
 +  rpcClient = RpcClientFactory.createClient(conf, clusterId, new 
 InetSocketAddress(
 +rpcServices.isa.getAddress(), 0));
BlockingRpcChannel channel =
 -this.rpcClient.createBlockingRpcChannel(sn, 
 userProvider.getCurrent(),
 +  rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
shortOperationTimeout);
intf = RegionServerStatusService.newBlockingStub(channel);
break;
 {code}
 If this is acceptable way for fixing this issue i will create and attach 
 patch?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


 [ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-12213:
---
Attachment: HBASE-12213_jmh.zip

Adding some jmh files that imitates the blockSeek behaviour in terms of the 
mark, reset and skipping to read till the end of the block.

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch, 
 HBASE-12213_4.patch, HBASE-12213_jmh.zip


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.


[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614986#comment-14614986
 ] 

Anoop Sam John commented on HBASE-13337:


+1

 Table regions are not assigning back, after restarting all regionservers at 
 once.
 -

 Key: HBASE-13337
 URL: https://issues.apache.org/jira/browse/HBASE-13337
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 2.0.0
Reporter: Y. SREENIVASULU REDDY
Assignee: Samir Ahmic
Priority: Blocker
 Fix For: 2.0.0

 Attachments: HBASE-13337-v2.patch, HBASE-13337-v3.patch, 
 HBASE-13337.patch


 Regions of the table are continouly in state=FAILED_CLOSE.
 {noformat}
 RegionState   
   
   RIT time (ms)
 8f62e819b356736053e06240f7f7c6fd  
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM1,16040,1427362531818  113929
 caf59209ae65ea80fca6bdc6996a7d68  
 t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM2,16040,1427362533691  113929
 db52a74988f71e5cf257bbabf31f26f3  
 t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM3,16040,1427362533691  113920
 43f3a65b9f9ff283f598c5450feab1f8  
 t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
 state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
 server=VM1,16040,1427362531818  113920
 {noformat}
 *Steps to reproduce:*
 1. Start HBase cluster with more than one regionserver.
 2. Create a table with precreated regions. (lets say 15 regions)
 3. Make sure the regions are well balanced.
 4. Restart all the Regionservers process at once across the cluster, except 
 HMaster process
 5. After restarting the Regionservers, successfully will connect to the 
 HMaster.
 *Bug:*
 But no regions are assigning back to the Regionservers.
 *Master log shows as follows:*
 {noformat}
 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
 state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
 {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
 server=VM1,16040,1427362531818}
 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStateStore: Updating row 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
 state=PENDING_OPENsn=VM1,16040,1427362531818
 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Force region state offline 
 {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
 server=VM1,16040,1427362531818}
 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
 state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
 {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
 server=VM1,16040,1427362531818}
 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.RegionStateStore: Updating row 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
 state=PENDING_CLOSE
 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
 master.AssignmentManager: Server VM1,16040,1427362531818 returned 
 java.nio.channels.ClosedChannelException for 
 t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10
 2015-03-26 15:05:36,249 INFO

[jira] [Commented] (HBASE-13849) Remove restore and clone snapshot from the WebUI


[ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615538#comment-14615538
 ] 

Hudson commented on HBASE-13849:


FAILURE: Integrated in HBase-1.3 #37 (See 
[https://builds.apache.org/job/HBase-1.3/37/])
HBASE-13849 Remove restore and clone snapshot from the WebUI (busbey: rev 
d347c66c90dc727ac9fc059eba0db4064ee818b5)
* hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp


 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low


[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615561#comment-14615561
 ] 

Hadoop QA commented on HBASE-13832:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743753/HBASE-13832-v5.patch
  against master branch at commit 608c3aa15c34b9014f99e857b374645db58cbbe3.
  ATTACHMENT ID: 12743753

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.wal.TestWALSplit.testThreadingSlowWriterSmallBuffer(TestWALSplit.java:902)
at 
org.apache.hadoop.hbase.wal.TestWALSplit.testSplitDeletedRegion(TestWALSplit.java:722)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14679//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14679//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14679//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14679//console

This message is automatically generated.

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HBASE-13832-v5.patch, 
 HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-13646) HRegion#execService should not try to build incomplete messages


[ 
https://issues.apache.org/jira/browse/HBASE-13646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615595#comment-14615595
 ] 

Hudson commented on HBASE-13646:


FAILURE: Integrated in HBase-0.98 #1047 (See 
[https://builds.apache.org/job/HBase-0.98/1047/])
HBASE-13646 HRegion#execService should not try to build incomplete messages 
(busbey: rev 2353c1bcf7debcc90e1f6d47787404c42a9d53b0)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.java
* hbase-server/src/test/protobuf/DummyRegionServerEndpoint.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/protobuf/generated/DummyRegionServerEndpointProtos.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorEndpoint.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
* hbase-server/pom.xml


 HRegion#execService should not try to build incomplete messages
 ---

 Key: HBASE-13646
 URL: https://issues.apache.org/jira/browse/HBASE-13646
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, regionserver
Affects Versions: 2.0.0, 1.2.0, 1.1.1
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13646-branch-1.patch, HBASE-13646.patch, 
 HBASE-13646.v2.patch, HBASE-13646.v2.patch


 If some RPC service, called on region throws exception, execService still 
 tries to build Message. In case of complex messages with required fields it 
 complicates service code because service need to pass fake protobuf objects, 
 so they can be barely buildable. 
 To mitigate that I propose to check that controller was failed and return 
 null from call instead of failing with exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615625#comment-14615625
 ] 

Hadoop QA commented on HBASE-14017:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12743786/HBASE-14017.as-pushed-master.patch
  against master branch at commit 1713f1fcaf9d721a97bc564faaf070f2e6b0b1d1.
  ATTACHMENT ID: 12743786

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14681//console

This message is automatically generated.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615658#comment-14615658
 ] 

Sean Busbey commented on HBASE-14017:
-

I've been pushing the changes [~mbertozzi] had ready to go in a local repo 
before the ASF git outage started. I just got to branch-1.1. I'll check to see 
if it's different?

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615661#comment-14615661
 ] 

Hudson commented on HBASE-14017:


SUCCESS: Integrated in HBase-1.3-IT #22 (See 
[https://builds.apache.org/job/HBase-1.3-IT/22/])
HBASE-14017 Procedure v2 - MasterProcedureQueue fix concurrency issue on table 
queue deletion (busbey: rev 80b0a3e914c8f7b2600de93a27cc5d050d36ebf7)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureQueue.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureQueue.java


 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13927) Allow hbase-daemon.sh to conditionally redirect the log or not


[ 
https://issues.apache.org/jira/browse/HBASE-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615660#comment-14615660
 ] 

Hudson commented on HBASE-13927:


SUCCESS: Integrated in HBase-1.3-IT #22 (See 
[https://builds.apache.org/job/HBase-1.3-IT/22/])
HBASE-13927 Allow hbase-daemon.sh to conditionally redirect the log or not 
(busbey: rev e28094fe4d1569a67131322e59090a0196889d05)
* bin/hbase-daemon.sh


 Allow hbase-daemon.sh to conditionally redirect the log or not
 --

 Key: HBASE-13927
 URL: https://issues.apache.org/jira/browse/HBASE-13927
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 2.0.0, 1.2.0
Reporter: Elliott Clark
Assignee: Elliott Clark
  Labels: shell
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13927.patch, HBASE-13927.patch


 Kind of like HBASE_NOEXEC allow hbase-daemon to skip redirecting to a log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-14017:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to branch-1.1+.

branch-1, branch-1.2 both were equivalent to the as-pushed-to-master patch. 
branch-1.1 was equivalent to the posted v1-branch-1 patch.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14017:
---
Attachment: (was: HBASE-14017.v1-branch1.1.patch)

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.v1-branch1.1.patch, HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14017:
---
Attachment: HBASE-14017.v1-branch1.1.patch

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.v1-branch1.1.patch, HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13927) Allow hbase-daemon.sh to conditionally redirect the log or not


[ 
https://issues.apache.org/jira/browse/HBASE-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615608#comment-14615608
 ] 

Hudson commented on HBASE-13927:


FAILURE: Integrated in HBase-TRUNK #6631 (See 
[https://builds.apache.org/job/HBase-TRUNK/6631/])
HBASE-13927 Allow hbase-daemon.sh to conditionally redirect the log or not 
(busbey: rev 608c3aa15c34b9014f99e857b374645db58cbbe3)
* bin/hbase-daemon.sh


 Allow hbase-daemon.sh to conditionally redirect the log or not
 --

 Key: HBASE-13927
 URL: https://issues.apache.org/jira/browse/HBASE-13927
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 2.0.0, 1.2.0
Reporter: Elliott Clark
Assignee: Elliott Clark
  Labels: shell
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13927.patch, HBASE-13927.patch


 Kind of like HBASE_NOEXEC allow hbase-daemon to skip redirecting to a log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13849) Remove restore and clone snapshot from the WebUI


[ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615676#comment-14615676
 ] 

Hudson commented on HBASE-13849:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1001 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1001/])
HBASE-13849 Remove restore and clone snapshot from the WebUI (busbey: rev 
b30d0311aade4e5c8cff23cb4a3dadc742f6e237)
* hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp


 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


[ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615267#comment-14615267
 ] 

ramkrishna.s.vasudevan commented on HBASE-12213:


https://reviews.apache.org/r/36206/

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch, 
 HBASE-12213_4.patch, HBASE-12213_jmh.zip


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-12213) HFileBlock backed by Array of ByteBuffers


[ 
https://issues.apache.org/jira/browse/HBASE-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615267#comment-14615267
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-12213 at 7/6/15 4:22 PM:


https://reviews.apache.org/r/36206/ - RB link


was (Author: ram_krish):
https://reviews.apache.org/r/36206/

 HFileBlock backed by Array of ByteBuffers
 -

 Key: HBASE-12213
 URL: https://issues.apache.org/jira/browse/HBASE-12213
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, Scanners
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-12213_1.patch, HBASE-12213_2.patch, 
 HBASE-12213_4.patch, HBASE-12213_jmh.zip


 In L2 cache (offheap) an HFile block might have been cached into multiple 
 chunks of buffers. If HFileBlock need single BB, we will end up in recreation 
 of bigger BB and copying. Instead we can make HFileBlock to serve data from 
 an array of BBs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14012) Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover

[
https://issues.apache.org/jira/browse/HBASE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615353#comment-14615353
]

Hudson commented on HBASE-14012:

SUCCESS: Integrated in HBase-1.3 #36 (See
[https://builds.apache.org/job/HBase-1.3/36/])
HBASE-14012 Double Assignment and Dataloss when ServerCrashProcedure runs
during Master failover (stack: rev 4e36815906e5fd175063b0700532e3dce88bcda6)
* hbase-protocol/src/main/protobuf/MasterProcedure.proto
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
*
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProcedureProtos.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
*
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java

Double Assignment and Dataloss when ServerCrashProcedure runs during Master
failover

Key: HBASE-14012
URL: https://issues.apache.org/jira/browse/HBASE-14012
Project: HBase
Issue Type: Bug
Components: master, Region Assignment
Affects Versions: 2.0.0, 1.2.0
Reporter: stack
Assignee: stack
Priority: Blocker
Fix For: 2.0.0, 1.2.0, 1.3.0

Attachments: 14012.txt, 14012v2.txt

(Rewrite to be more explicit about what the problem is)
ITBLL. Master comes up (It is being killed every 1-5 minutes or so). It is
joining a running cluster (all servers up except Master with most regions
assigned out on cluster). ProcedureStore has two ServerCrashProcedures
unfinished (RUNNABLE state) for two separate servers. One SCP is in the
middle of the assign step when master crashes (SERVER_CRASH_ASSIGN). This SCP
step has this comment on it:
{code}
// Assign may not be idempotent. SSH used to requeue the SSH if we
got an IOE assigning
// which is what we are mimicing here but it looks prone to double
assignment if assign
// fails midway. TODO: Test.
{code}
This issue is 1.2+ only since it is ServerCrashProcedure (Added in
HBASE-13616, post hbase-1.1.x).
Looking at ServerShutdownHandler, how we used to do crash processing before
we moved over to the Pv2 framework, SSH may have (accidentally) avoided this
issue since it does its processing in a big blob starting over if killed
mid-crash. In particular, post-crash, SSH scans hbase:meta to find servers
that were on the downed server. SCP scanneds Meta in one step, saves off the
regions it finds into the ProcedureStore, and then in the next step, does
actual assign. In this case, we crashed post-meta scan and during assign.
Assign is a bulk assign. It mostly succeeded but got this:
{code}
809622 2015-06-09 20:05:28,576 INFO [ProcedureExecutorThread-9]
master.GeneralBulkAssigner: Failed assigning 3 regions to server
c2021.halxg.cloudera.com,16020,1433905510696, reassigning them
{code}
So, most regions actually made it to new locations except for a few
stragglers. All of the successfully assigned regions then are reassigned on
other side of master restart when we replay the SCP assign step.
Let me put together the scan meta and assign steps in SCP; this should do
until we redo all of assign to run on Pv2.
A few other things I noticed:
In SCP, we only check if failover in first step, not for every step, which
means ServerCrashProcedure will run if on reload it is beyond the first step.
{code}
// Is master fully online? If not, yield. No processing of servers unless
master is up
if (!services.getAssignmentManager().isFailoverCleanupDone()) {
throwProcedureYieldException(Waiting on master failover to complete);
}
{code}
This means we are assigning while Master is still coming up, a no-no (though
it does not seem to have caused problem here). Fix.
Also, I see that over the 8 hours of this particular log, each time the
master crashes and comes back up, we queue a ServerCrashProcedure for c2022
because an empty dir never gets cleaned up:
{code}
39 2015-06-09 22:15:33,074 WARN [ProcedureExecutorThread-0]
master.SplitLogManager: returning success without actually splitting and
deleting all the log files in path
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1433902151857-splitting
{code}
Fix this too.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13849) Remove restore and clone snapshot from the WebUI


[ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615395#comment-14615395
 ] 

Hudson commented on HBASE-13849:


SUCCESS: Integrated in HBase-1.3-IT #21 (See 
[https://builds.apache.org/job/HBase-1.3-IT/21/])
HBASE-13849 Remove restore and clone snapshot from the WebUI (busbey: rev 
d347c66c90dc727ac9fc059eba0db4064ee818b5)
* hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp


 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14012) Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover

[
https://issues.apache.org/jira/browse/HBASE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615397#comment-14615397
]

Hudson commented on HBASE-14012:

SUCCESS: Integrated in HBase-1.3-IT #21 (See
[https://builds.apache.org/job/HBase-1.3-IT/21/])
HBASE-14012 Double Assignment and Dataloss when ServerCrashProcedure runs
during Master failover (stack: rev 4e36815906e5fd175063b0700532e3dce88bcda6)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
*
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
* hbase-protocol/src/main/protobuf/MasterProcedure.proto
*
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProcedureProtos.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java

Double Assignment and Dataloss when ServerCrashProcedure runs during Master
failover

Attachments: 14012.txt, 14012v2.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13646) HRegion#execService should not try to build incomplete messages


[ 
https://issues.apache.org/jira/browse/HBASE-13646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615396#comment-14615396
 ] 

Hudson commented on HBASE-13646:


SUCCESS: Integrated in HBase-1.3-IT #21 (See 
[https://builds.apache.org/job/HBase-1.3-IT/21/])
HBASE-13646 HRegion#execService should not try to build incomplete messages 
(busbey: rev 8d71d283b92501baf9ba681c8cb63e234e4c2ece)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorEndpoint.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/protobuf/generated/DummyRegionServerEndpointProtos.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.java
* hbase-server/src/test/protobuf/DummyRegionServerEndpoint.proto
* hbase-server/pom.xml


 HRegion#execService should not try to build incomplete messages
 ---

 Key: HBASE-13646
 URL: https://issues.apache.org/jira/browse/HBASE-13646
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, regionserver
Affects Versions: 2.0.0, 1.2.0, 1.1.1
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13646-branch-1.patch, HBASE-13646.patch, 
 HBASE-13646.v2.patch, HBASE-13646.v2.patch


 If some RPC service, called on region throws exception, execService still 
 tries to build Message. In case of complex messages with required fields it 
 complicates service code because service need to pass fake protobuf objects, 
 so they can be barely buildable. 
 To mitigate that I propose to check that controller was failed and return 
 null from call instead of failing with exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615417#comment-14615417
 ] 

Sean Busbey commented on HBASE-14017:
-

current patches do not apply to master or branch-1.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.v1-branch1.1.patch, HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-14017:

Attachment: HBASE-14017.as-pushed-master.patch

attaching patch as pushed to master for [~mbertozzi]

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13849) Remove restore and clone snapshot from the WebUI


[ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615594#comment-14615594
 ] 

Hudson commented on HBASE-13849:


FAILURE: Integrated in HBase-0.98 #1047 (See 
[https://builds.apache.org/job/HBase-0.98/1047/])
HBASE-13849 Remove restore and clone snapshot from the WebUI (busbey: rev 
b30d0311aade4e5c8cff23cb4a3dadc742f6e237)
* hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp


 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13849) Remove restore and clone snapshot from the WebUI


[ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615609#comment-14615609
 ] 

Hudson commented on HBASE-13849:


FAILURE: Integrated in HBase-TRUNK #6631 (See 
[https://builds.apache.org/job/HBase-TRUNK/6631/])
HBASE-13849 Remove restore and clone snapshot from the WebUI (busbey: rev 
d26978a2ff26a918ccbb52fe002294d567e5d8f9)
* hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp


 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13927) Allow hbase-daemon.sh to conditionally redirect the log or not


[ 
https://issues.apache.org/jira/browse/HBASE-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615537#comment-14615537
 ] 

Hudson commented on HBASE-13927:


FAILURE: Integrated in HBase-1.3 #37 (See 
[https://builds.apache.org/job/HBase-1.3/37/])
HBASE-13927 Allow hbase-daemon.sh to conditionally redirect the log or not 
(busbey: rev e28094fe4d1569a67131322e59090a0196889d05)
* bin/hbase-daemon.sh


 Allow hbase-daemon.sh to conditionally redirect the log or not
 --

 Key: HBASE-13927
 URL: https://issues.apache.org/jira/browse/HBASE-13927
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 2.0.0, 1.2.0
Reporter: Elliott Clark
Assignee: Elliott Clark
  Labels: shell
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13927.patch, HBASE-13927.patch


 Kind of like HBASE_NOEXEC allow hbase-daemon to skip redirecting to a log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13646) HRegion#execService should not try to build incomplete messages


[ 
https://issues.apache.org/jira/browse/HBASE-13646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615539#comment-14615539
 ] 

Hudson commented on HBASE-13646:


FAILURE: Integrated in HBase-1.3 #37 (See 
[https://builds.apache.org/job/HBase-1.3/37/])
HBASE-13646 HRegion#execService should not try to build incomplete messages 
(busbey: rev 8d71d283b92501baf9ba681c8cb63e234e4c2ece)
* hbase-server/src/test/protobuf/DummyRegionServerEndpoint.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorEndpoint.java
* hbase-server/pom.xml
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/protobuf/generated/DummyRegionServerEndpointProtos.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 HRegion#execService should not try to build incomplete messages
 ---

 Key: HBASE-13646
 URL: https://issues.apache.org/jira/browse/HBASE-13646
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, regionserver
Affects Versions: 2.0.0, 1.2.0, 1.1.1
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13646-branch-1.patch, HBASE-13646.patch, 
 HBASE-13646.v2.patch, HBASE-13646.v2.patch


 If some RPC service, called on region throws exception, execService still 
 tries to build Message. In case of complex messages with required fields it 
 complicates service code because service need to pass fake protobuf objects, 
 so they can be barely buildable. 
 To mitigate that I propose to check that controller was failed and return 
 null from call instead of failing with exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13927) Allow hbase-daemon.sh to conditionally redirect the log or not


[ 
https://issues.apache.org/jira/browse/HBASE-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615583#comment-14615583
 ] 

Hudson commented on HBASE-13927:


FAILURE: Integrated in HBase-1.2 #53 (See 
[https://builds.apache.org/job/HBase-1.2/53/])
HBASE-13927 Allow hbase-daemon.sh to conditionally redirect the log or not 
(busbey: rev 3e2636b4b314072ace25aba809c1631332a4076e)
* bin/hbase-daemon.sh


 Allow hbase-daemon.sh to conditionally redirect the log or not
 --

 Key: HBASE-13927
 URL: https://issues.apache.org/jira/browse/HBASE-13927
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 2.0.0, 1.2.0
Reporter: Elliott Clark
Assignee: Elliott Clark
  Labels: shell
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13927.patch, HBASE-13927.patch


 Kind of like HBASE_NOEXEC allow hbase-daemon to skip redirecting to a log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13849) Remove restore and clone snapshot from the WebUI


[ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615584#comment-14615584
 ] 

Hudson commented on HBASE-13849:


FAILURE: Integrated in HBase-1.2 #53 (See 
[https://builds.apache.org/job/HBase-1.2/53/])
HBASE-13849 Remove restore and clone snapshot from the WebUI (busbey: rev 
2bc55875cb533af1a4bbe3bb02482b4e9d4bf4c8)
* hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp


 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13927) Allow hbase-daemon.sh to conditionally redirect the log or not


[ 
https://issues.apache.org/jira/browse/HBASE-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615613#comment-14615613
 ] 

Hudson commented on HBASE-13927:


FAILURE: Integrated in HBase-1.2-IT #39 (See 
[https://builds.apache.org/job/HBase-1.2-IT/39/])
HBASE-13927 Allow hbase-daemon.sh to conditionally redirect the log or not 
(busbey: rev 3e2636b4b314072ace25aba809c1631332a4076e)
* bin/hbase-daemon.sh


 Allow hbase-daemon.sh to conditionally redirect the log or not
 --

 Key: HBASE-13927
 URL: https://issues.apache.org/jira/browse/HBASE-13927
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 2.0.0, 1.2.0
Reporter: Elliott Clark
Assignee: Elliott Clark
  Labels: shell
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13927.patch, HBASE-13927.patch


 Kind of like HBASE_NOEXEC allow hbase-daemon to skip redirecting to a log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615647#comment-14615647
 ] 

Stephen Yuan Jiang commented on HBASE-14017:


[~busbey] Sean, could you also push the same master patch to branch-1.2?  The 
code in branch-1.1 is a little different, you need to use the branch-1.1 patch 
in this JIRA to push the change to branch-1.1.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14017:
---
Fix Version/s: 1.3.0

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615705#comment-14615705
 ] 

Stephen Yuan Jiang commented on HBASE-14017:


Great!  Thanks, Sean.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13849) Remove restore and clone snapshot from the WebUI


 [ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-13849:

  Resolution: Fixed
Hadoop Flags: Incompatible change
Release Note: The HBase master status web page no longer allows operators 
to clone snapshots nor restore snapshots.
  Status: Resolved  (was: Patch Available)

 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13646) HRegion#execService should not try to build incomplete messages


[ 
https://issues.apache.org/jira/browse/HBASE-13646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615332#comment-14615332
 ] 

Hudson commented on HBASE-13646:


SUCCESS: Integrated in HBase-1.2-IT #38 (See 
[https://builds.apache.org/job/HBase-1.2-IT/38/])
HBASE-13646 HRegion#execService should not try to build incomplete messages 
(busbey: rev 042f53b2f50b7c57fcf2eec62f8c67be57b0d850)
* hbase-server/pom.xml
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/protobuf/generated/DummyRegionServerEndpointProtos.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* hbase-server/src/test/protobuf/DummyRegionServerEndpoint.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorEndpoint.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 HRegion#execService should not try to build incomplete messages
 ---

 Key: HBASE-13646
 URL: https://issues.apache.org/jira/browse/HBASE-13646
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, regionserver
Affects Versions: 2.0.0, 1.2.0, 1.1.1
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13646-branch-1.patch, HBASE-13646.patch, 
 HBASE-13646.v2.patch, HBASE-13646.v2.patch


 If some RPC service, called on region throws exception, execService still 
 tries to build Message. In case of complex messages with required fields it 
 complicates service code because service need to pass fake protobuf objects, 
 so they can be barely buildable. 
 To mitigate that I propose to check that controller was failed and return 
 null from call instead of failing with exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


 [ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-14017:
---
Status: Patch Available  (was: Open)

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 1.1.1, 2.0.0, 1.2.0, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.v1-branch1.1.patch, HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14012) Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover

[
https://issues.apache.org/jira/browse/HBASE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615405#comment-14615405
]

Hudson commented on HBASE-14012:

SUCCESS: Integrated in HBase-1.2 #52 (See
[https://builds.apache.org/job/HBase-1.2/52/])
HBASE-14012 Double Assignment and Dataloss when ServerCrashProcedure runs
during Master failover (stack: rev 8660a6004c7bc500536e43c0d35498cfc16c9867)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
*
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProcedureProtos.java
*
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
* hbase-protocol/src/main/protobuf/MasterProcedure.proto

Double Assignment and Dataloss when ServerCrashProcedure runs during Master
failover

Attachments: 14012.txt, 14012v2.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13646) HRegion#execService should not try to build incomplete messages


[ 
https://issues.apache.org/jira/browse/HBASE-13646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615404#comment-14615404
 ] 

Hudson commented on HBASE-13646:


SUCCESS: Integrated in HBase-1.2 #52 (See 
[https://builds.apache.org/job/HBase-1.2/52/])
HBASE-13646 HRegion#execService should not try to build incomplete messages 
(busbey: rev 042f53b2f50b7c57fcf2eec62f8c67be57b0d850)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorEndpoint.java
* hbase-server/src/test/protobuf/DummyRegionServerEndpoint.proto
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/protobuf/generated/DummyRegionServerEndpointProtos.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.java
* hbase-server/pom.xml


 HRegion#execService should not try to build incomplete messages
 ---

 Key: HBASE-13646
 URL: https://issues.apache.org/jira/browse/HBASE-13646
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, regionserver
Affects Versions: 2.0.0, 1.2.0, 1.1.1
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13646-branch-1.patch, HBASE-13646.patch, 
 HBASE-13646.v2.patch, HBASE-13646.v2.patch


 If some RPC service, called on region throws exception, execService still 
 tries to build Message. In case of complex messages with required fields it 
 complicates service code because service need to pass fake protobuf objects, 
 so they can be barely buildable. 
 To mitigate that I propose to check that controller was failed and return 
 null from call instead of failing with exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

[
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615385#comment-14615385
]

Vladimir Rodionov commented on HBASE-7912:
--

So, the deduplication can be implemented either during read (original approach
mentioned in the doc) or during restore (see above) or even during backup.
Deduplication during backup will require WAL filtering during copy support, but
this is a feature which is on the roadmap. There are some pro- and contra- for
all of them.

READ: very simple to implement, but we will have duplication of some data in a
file system of HBase after restore.
RESTORE: not that simple, but no data duplication in HBase cluster after
restore, but there will be some data duplication in backup location.
BACKUP: not that simple as well, but no duplication at all ... but relies on
WAL filtering during copy support.

HBase Backup/Restore Based on HBase Snapshot

Key: HBASE-7912
URL: https://issues.apache.org/jira/browse/HBASE-7912
Project: HBase
Issue Type: Sub-task
Reporter: Richard Ding
Assignee: Vladimir Rodionov
Labels: backup
Fix For: 2.0.0

Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf,
HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf,
HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf,
HBase_BackupRestore-Jira-7912-CLI-v1.pdf

Finally, we completed the implementation of our backup/restore solution, and
would like to share with community through this jira.
We are leveraging existing hbase snapshot feature, and provide a general
solution to common users. Our full backup is using snapshot to capture
metadata locally and using exportsnapshot to move data to another cluster;
the incremental backup is using offline-WALplayer to backup HLogs; we also
leverage global distribution rolllog and flush to improve performance; other
added-on values such as convert, merge, progress report, and CLI commands. So
that a common user can backup hbase data without in-depth knowledge of hbase.
Our solution also contains some usability features for enterprise users.
The detail design document and CLI command will be attached in this jira. We
plan to use 10~12 subtasks to share each of the following features, and
document the detail implement in the subtasks:
* *Full Backup* : provide local and remote back/restore for a list of tables
* *offline-WALPlayer* to convert HLog to HFiles offline (for incremental
backup)
* *distributed* Logroll and distributed flush
* Backup *Manifest* and history
* *Incremental* backup: to build on top of full backup as daily/weekly backup
* *Convert* incremental backup WAL files into hfiles
* *Merge* several backup images into one(like merge weekly into monthly)
* *add and remove* table to and from Backup image
* *Cancel* a backup process
* backup progress *status*
* full backup based on *existing snapshot*
*-*
*Below is the original description, to keep here as the history for the
design and discussion back in 2013*
There have been attempts in the past to come up with a viable HBase
backup/restore solution (e.g., HBASE-4618). Recently, there are many
advancements and new features in HBase, for example, FileLink, Snapshot, and
Distributed Barrier Procedure. This is a proposal for a backup/restore
solution that utilizes these new features to achieve better performance and
consistency.

A common practice of backup and restore in database is to first take full
baseline backup, and then periodically take incremental backup that capture
the changes since the full baseline backup. HBase cluster can store massive
amount data. Combination of full backups with incremental backups has
tremendous benefit for HBase as well. The following is a typical scenario
for full and incremental backup.
# The user takes a full backup of a table or a set of tables in HBase.
# The user schedules periodical incremental backups to capture the changes
from the full backup, or from last incremental backup.
# The user needs to restore table data to a past point of time.
# The full backup is restored to the table(s) or to different table name(s).
Then the incremental backups that are up to the desired point in time are
applied on top of the full backup.
We would support the following key features and capabilities.
* Full backup uses HBase snapshot to capture HFiles.
* Use HBase WALs to capture incremental changes, but we use bulk load of
HFiles for fast incremental restore.
* Support single table or a set of tables, and column family level backup and
restore.
* Restore to different

[jira] [Commented] (HBASE-14012) Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover

[
https://issues.apache.org/jira/browse/HBASE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615333#comment-14615333
]

Hudson commented on HBASE-14012:

SUCCESS: Integrated in HBase-1.2-IT #38 (See
[https://builds.apache.org/job/HBase-1.2-IT/38/])
HBASE-14012 Double Assignment and Dataloss when ServerCrashProcedure runs
during Master failover (stack: rev 8660a6004c7bc500536e43c0d35498cfc16c9867)
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
*
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProcedureProtos.java
* hbase-protocol/src/main/protobuf/MasterProcedure.proto
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
*
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java

Double Assignment and Dataloss when ServerCrashProcedure runs during Master
failover

Attachments: 14012.txt, 14012v2.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13849) Remove restore and clone snapshot from the WebUI


[ 
https://issues.apache.org/jira/browse/HBASE-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615331#comment-14615331
 ] 

Hudson commented on HBASE-13849:


SUCCESS: Integrated in HBase-1.2-IT #38 (See 
[https://builds.apache.org/job/HBase-1.2-IT/38/])
HBASE-13849 Remove restore and clone snapshot from the WebUI (busbey: rev 
2bc55875cb533af1a4bbe3bb02482b4e9d4bf4c8)
* hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp


 Remove restore and clone snapshot from the WebUI
 

 Key: HBASE-13849
 URL: https://issues.apache.org/jira/browse/HBASE-13849
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 1.0.1, 1.1.0, 0.98.13, 1.2.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13849-v0.patch


 Remove the clone and restore snapshot buttons from the WebUI.
 first reason, is that the operation may be too long for having the user wait 
 on the WebUI.
 second reason is that an action from the webUI does not play well with 
 security. since it is going to be executed by the hbase user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

[
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615352#comment-14615352
]

Vladimir Rodionov commented on HBASE-7912:
--

Sequence ID is per Region and WAL is per RegionServer. We will need to store
maximum sequence id per region after full backup. When we finish snapshot, we
can collect all maximum sequence ids from store files and store them as a
MapRegion, long. During first incremental restore, we will need to check
sequence id of every WAL Entry. If its below or equals to maximum seq id for
this region - skip this entry. The logic will remains unchanged for all other
incremental restores after the first one.

HBase Backup/Restore Based on HBase Snapshot

Key: HBASE-7912
URL: https://issues.apache.org/jira/browse/HBASE-7912
Project: HBase
Issue Type: Sub-task
Reporter: Richard Ding
Assignee: Vladimir Rodionov
Labels: backup
Fix For: 2.0.0

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

2015-07-06 Thread Srikanth Srungarapu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615388#comment-14615388
 ] 

Srikanth Srungarapu commented on HBASE-14017:
-

+1 lgtm.

 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.v1-branch1.1.patch, HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low

2015-07-06 Thread Matteo Bertozzi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-13832:

Attachment: HBASE-13832-v5.patch

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HBASE-13832-v5.patch, 
 HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13927) Allow hbase-daemon.sh to conditionally redirect the log or not


 [ 
https://issues.apache.org/jira/browse/HBASE-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-13927:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Allow hbase-daemon.sh to conditionally redirect the log or not
 --

 Key: HBASE-13927
 URL: https://issues.apache.org/jira/browse/HBASE-13927
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 2.0.0, 1.2.0
Reporter: Elliott Clark
Assignee: Elliott Clark
  Labels: shell
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13927.patch, HBASE-13927.patch


 Kind of like HBASE_NOEXEC allow hbase-daemon to skip redirecting to a log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13927) Allow hbase-daemon.sh to conditionally redirect the log or not


 [ 
https://issues.apache.org/jira/browse/HBASE-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-13927:

Issue Type: Improvement  (was: Bug)

 Allow hbase-daemon.sh to conditionally redirect the log or not
 --

 Key: HBASE-13927
 URL: https://issues.apache.org/jira/browse/HBASE-13927
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 2.0.0, 1.2.0
Reporter: Elliott Clark
Assignee: Elliott Clark
  Labels: shell
 Fix For: 2.0.0, 1.2.0

 Attachments: HBASE-13927.patch, HBASE-13927.patch


 Kind of like HBASE_NOEXEC allow hbase-daemon to skip redirecting to a log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14012) Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover

[
https://issues.apache.org/jira/browse/HBASE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615400#comment-14615400
]

Hudson commented on HBASE-14012:

SUCCESS: Integrated in HBase-TRUNK #6630 (See
[https://builds.apache.org/job/HBase-TRUNK/6630/])
HBASE-14012 Double Assignment and Dataloss when ServerCrashProcedure (stack:
rev cff1a5f1f5cc8f5e1e99c6aecb39e2e69f86bf7e)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
*
hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
* hbase-protocol/src/main/protobuf/MasterProcedure.proto
*
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProcedureProtos.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java

Double Assignment and Dataloss when ServerCrashProcedure runs during Master
failover

Attachments: 14012.txt, 14012v2.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-06 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615265#comment-14615265
 ] 

Ted Yu commented on HBASE-13965:


{code}
32public void updateStochasticCost(String tableName, String 
costFunctionName,
{code}
Should tableName parameter be of type TableName ?

For MetricsStochasticBalancerSourceImpl.java , it should be annotated with 
@InterfaceAudience.Private
{code}
56 * The function that report stochastic load balancer costs to JMX
{code}
'that report' - 'that reports'
{code}
58public void updateStochasticCost(String tableName, String 
costFunctionName,
59String costFunctionDesc, Double value) {
{code}
costFunctionDesc isn't used in the method.
{code}
82String attrName = tableName + ((tableName.length() = 0) ?  
: TABLE_FUNCTION_SEP) + key;
{code}
When would tableName.length() be = 0 ? There is check at the beginning of 
updateStochasticCost().
{code}
124   private Double[] lastSubcosts;
{code}
nit: Uppercase 'c' of 'costs'
{code}
470   this.lastSubcosts[i] = multiplier*cost;
471   total += multiplier * cost;
{code}
There is no need to perform the same multiplication twice.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot

[
https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615326#comment-14615326
]

Vladimir Rodionov commented on HBASE-7912:
--

[~jerryhe] wrote:
{quote}
Regarding the section First incremental after full backup restore, yes, there
could data duplicated in two backups (the full and the incr).
It is better to fix it during the backup.
{quote}

Doing this during backup requires support of backup/restore up to a sequence Id
(mvcc number). Doing this in a read path is trivial (several lines of code in a
StoreScanner).

HBase Backup/Restore Based on HBase Snapshot

Key: HBASE-7912
URL: https://issues.apache.org/jira/browse/HBASE-7912
Project: HBase
Issue Type: Sub-task
Reporter: Richard Ding
Assignee: Vladimir Rodionov
Labels: backup
Fix For: 2.0.0

[jira] [Commented] (HBASE-12596) bulkload needs to follow locality


[ 
https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615452#comment-14615452
 ] 

Hadoop QA commented on HBASE-12596:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12743625/HBASE-12596-master-v3.patch
  against master branch at commit cff1a5f1f5cc8f5e1e99c6aecb39e2e69f86bf7e.
  ATTACHMENT ID: 12743625

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1899 checkstyle errors (more than the master's current 1898 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.rest.TestTableResource
  org.apache.hadoop.hbase.rest.TestTableScan
  org.apache.hadoop.hbase.rest.TestScannerResource
  org.apache.hadoop.hbase.rest.TestDeleteRow

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14677//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14677//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14677//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14677//console

This message is automatically generated.

 bulkload needs to follow locality
 -

 Key: HBASE-12596
 URL: https://issues.apache.org/jira/browse/HBASE-12596
 Project: HBase
  Issue Type: Improvement
  Components: HFile, regionserver
Affects Versions: 0.98.8
 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7
Reporter: Victor Xu
Assignee: Victor Xu
 Fix For: 0.98.14

 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-0.98-v2.patch, 
 HBASE-12596-0.98-v3.patch, HBASE-12596-master-v1.patch, 
 HBASE-12596-master-v2.patch, HBASE-12596-master-v3.patch, HBASE-12596.patch


 Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles 
 to be loaded; 2. Move these HFiles to the right hdfs directory. However, the 
 locality could be loss during the first step. Why not just write the HFiles 
 directly into the right place? We can do this easily because 
 StoreFile.WriterBuilder has the withFavoredNodes method, and we just need 
 to call it in HFileOutputFormat's getNewWriter().
 This feature is enabled by default, and we could use 
 'hbase.bulkload.locality.sensitive.enabled=false' to disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics


[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615518#comment-14615518
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks for your review and great feedback. I will update an updated patch.


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615437#comment-14615437
 ] 

Stephen Yuan Jiang commented on HBASE-14017:


I tried the branch-1 patch locally and had no problem applying it.




 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.v1-branch1.1.patch, HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics


[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615467#comment-14615467
 ] 

Hadoop QA commented on HBASE-13965:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743741/HBASE-13965-v3.patch
  against master branch at commit cff1a5f1f5cc8f5e1e99c6aecb39e2e69f86bf7e.
  ATTACHMENT ID: 12743741

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14678//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14678//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14678//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14678//console

This message is automatically generated.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13646) HRegion#execService should not try to build incomplete messages


[ 
https://issues.apache.org/jira/browse/HBASE-13646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615468#comment-14615468
 ] 

Hudson commented on HBASE-13646:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1000 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1000/])
HBASE-13646 HRegion#execService should not try to build incomplete messages 
(busbey: rev 2353c1bcf7debcc90e1f6d47787404c42a9d53b0)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorEndpoint.java
* hbase-server/pom.xml
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/protobuf/generated/DummyRegionServerEndpointProtos.java
* hbase-server/src/test/protobuf/DummyRegionServerEndpoint.proto
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java


 HRegion#execService should not try to build incomplete messages
 ---

 Key: HBASE-13646
 URL: https://issues.apache.org/jira/browse/HBASE-13646
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, regionserver
Affects Versions: 2.0.0, 1.2.0, 1.1.1
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 2.0.0, 0.98.14, 1.2.0

 Attachments: HBASE-13646-branch-1.patch, HBASE-13646.patch, 
 HBASE-13646.v2.patch, HBASE-13646.v2.patch


 If some RPC service, called on region throws exception, execService still 
 tries to build Message. In case of complex messages with required fields it 
 complicates service code because service need to pass fake protobuf objects, 
 so they can be barely buildable. 
 To mitigate that I propose to check that controller was failed and return 
 null from call instead of failing with exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13879) Add hbase.hstore.compactionThreshold to HConstants


[ 
https://issues.apache.org/jira/browse/HBASE-13879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615444#comment-14615444
 ] 

Anoop Sam John commented on HBASE-13879:


CompactionConfiguration.java seems the best place for this config constant no?  
Why we need change?   I would say better dont go with this jira

 Add hbase.hstore.compactionThreshold to HConstants
 --

 Key: HBASE-13879
 URL: https://issues.apache.org/jira/browse/HBASE-13879
 Project: HBase
  Issue Type: Improvement
Reporter: Gabor Liptak
Priority: Minor
 Attachments: HBASE-13879.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics


 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v4.patch

Updates:
1. report - reports
2. costFunctionDesc added to JMX
3. Unnecessary table name length check is removed. 
4. lastSubcosts - lastSubCosts
5. total += this.lastSubCosts[i];

TODO:
1. Make hard-coded map size configurable?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-06 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615805#comment-14615805
 ] 

Ted Yu commented on HBASE-13965:


{code}
79  costs.put(costFunctionName, descAndValue);
80  stochasticCosts.put(tableName, costs);
{code}
For any specific cost function, its description should be fixed.
In the above model, description for the same cost function is stored once per 
table.
Can we reduce the number of times description for same function is stored ?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13330) Region left unassigned due to AM SSH each thinking the assignment would be done by the other

[
https://issues.apache.org/jira/browse/HBASE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Enis Soztutar updated HBASE-13330:
--
Fix Version/s: (was: 1.0.2)
1.0.3

Region left unassigned due to AM SSH each thinking the assignment would be
done by the other
--

Key: HBASE-13330
URL: https://issues.apache.org/jira/browse/HBASE-13330
Project: HBase
Issue Type: Bug
Components: master, Region Assignment
Affects Versions: 1.0.0
Reporter: Devaraj Das
Assignee: Devaraj Das
Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1, 1.0.3

Attachments: 13330-branch-1.txt

Here is what I found during analysis of an issue. Raising this jira and a fix
will follow.
The TL;DR of this is that the AssignmentManager thinks the
ServerShutdownHandler would assign the region and the ServerShutdownHandler
thinks that the AssignmentManager would assign the region. The region
(0d6cf37c18c54c6f4744750c6a7be837) ultimately never gets assigned. Below is
an analysis from the logs that captures the flow of events.
1. The AssignmentManager had initially assigned this region to
dnj1-bcpc-r3n8.example.com,60020,1425598187703
2. When the master restarted it did a scan of the meta to learn about the
regions in the cluster. It found this region being assigned to
dnj1-bcpc-r3n8.example.com,60020,1425598187703 from the meta record.
3. However, this server (dnj1-bcpc-r3n8.example.com,60020,1425598187703) was
not alive anymore. So, the AssignmentManager queued up a
ServerShutdownHandling task for this (that asynchronously executes):
{noformat}
2015-03-06 14:09:31,355 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=dnj1-bcpc-r3n8.example.com,60020,1425598187703 to dead servers,
submitted shutdown handler to be executed meta=false
{noformat}
4. The AssignmentManager proceeded to read the RIT nodes from ZK. It found
this region as well:
{noformat}
2015-03-06 14:09:31,527 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Processing
0d6cf37c18c54c6f4744750c6a7be837
in state: RS_ZK_REGION_FAILED_OPEN
{noformat}
5. The region was moved to CLOSED state:
{noformat}
2015-03-06 14:09:31,527 WARN org.apache.hadoop.hbase.master.RegionStates:
0d6cf37c18c54c6f4744750c6a7be837 moved to CLOSED on
dnj1-bcpc-r3n2.example.com,60020,1425603618259, expected
dnj1-bcpc-r3n8.example.com,60020,1425598187703
{noformat}
Note the reference to dnj1-bcpc-r3n2.example.com,60020,1425603618259. This
means that the region was assigned to
dnj1-bcpc-r3n2.example.com,60020,1425603618259 but that regionserver couldn't
open the region for some reason, and it changed the state to
RS_ZK_REGION_FAILED_OPEN in RIT znode on ZK.
6. After that the AssignmentManager tried to assign it again. However, the
assignment didn't happen because the ServerShutdownHandling task queued
earlier didn't yet execute:
{noformat}
2015-03-06 14:09:31,527 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Skip assigning
phMonthlyVersion,\x89\x80\x00\x00,1423149098980.0d6cf37c18c54c6f4744750c6a7be837.,
it's host dnj1-bcpc-r3n8.example.com,60020,1425598187703 is dead but not
processed yet
{noformat}
7. Eventually the ServerShutdownHandling task executed.
{noformat}
2015-03-06 14:09:35,188 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs
for dnj1-bcpc-r3n8.example.com,60020,1425598187703 before assignment.
2015-03-06 14:09:35,209 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 19
region(s) that dnj1-bcpc-r3n8.example.com,60020,1425598187703 was
carrying (and 0 regions(s) that were opening on this server)
2015-03-06 14:09:35,211 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished
processing of shutdown of dnj1-bcpc-r3n8.example.com,60020,1425598187703
{noformat}
8. However, the ServerShutdownHandling task skipped the region in question.
This was because this region was in RIT, and the ServerShutdownHandling task
thinks that the AssignmentManager would assign it as part of handling the RIT
nodes:
{noformat}
2015-03-06 14:09:35,210 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Skip assigning
region in transition on other server{0d6cf37c18c54c6f4744750c6a7be837
state=CLOSED, ts=1425668971527,
server=dnj1-bcpc-r3n2.example.com,60020,1425603618259}
{noformat}
9. At some point in the future, when the server
dnj1-bcpc-r3n2.example.com,60020,1425603618259 dies, the
ServerShutdownHandling for it gets queued up (from the log
hbase-hbase-master-dnj1-bcpc-r3n1.log):
{noformat}
2015-03-09 11:35:10,607 INFO
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral
node

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615864#comment-14615864
 ] 

Hudson commented on HBASE-14017:


SUCCESS: Integrated in HBase-1.3 #38 (See 
[https://builds.apache.org/job/HBase-1.3/38/])
HBASE-14017 Procedure v2 - MasterProcedureQueue fix concurrency issue on table 
queue deletion (busbey: rev 80b0a3e914c8f7b2600de93a27cc5d050d36ebf7)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureQueue.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureQueue.java


 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615925#comment-14615925
 ] 

Hudson commented on HBASE-14017:


SUCCESS: Integrated in HBase-1.1 #574 (See 
[https://builds.apache.org/job/HBase-1.1/574/])
HBASE-14017 Procedure v2 - MasterProcedureQueue fix concurrency issue on table 
queue deletion (busbey: rev 38014398eda48891529258111b0f8c1491a0e9fa)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureQueue.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureQueue.java


 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13981) Fix ImportTsv spelling and usage issues

2015-07-06 Thread Apekshit Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615940#comment-14615940
 ] 

Apekshit Sharma commented on HBASE-13981:
-

[~gliptak] not sure why ATTRIBUTE_SEPERATOR_CONF_KEY or attributes.seperator 
are there. Since ImportTsv is @InterfaceAudience.Public, we can not simply 
delete it but can mark it @Deprecated.

Patch looks good. Just last two things. 

1.
{code}
-  public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = 
attributes.seperator;
+  public final static String ATTRIBUTE_SEPARATOR_CONF_KEY = 
attributes.separator;
...
-  final static String DEFAULT_ATTRIBUTES_SEPERATOR = =;
-  final static String DEFAULT_MULTIPLE_ATTRIBUTES_SEPERATOR = ,;
+  final static String DEFAULT_ATTRIBUTES_SEPARATOR = =;
+  final static String DEFAULT_MULTIPLE_ATTRIBUTES_SEPARATOR = ,;
{code}
Again, since ImportTsv is @InterfaceAudience.Public (read more about this 
[here|http://hbase.apache.org/book.html#hbase.client.api.surface]), we can not 
simply change the name. The right thing to do here would be

{code}
+  @Deprecated  
   public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = 
attributes.seperator;
+  public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = 
attributes.separator;
{code}
and replacing all uses of ATTRIBUTE_SEPERATOR_CONF_KEY with 
ATTRIBUTE_SEPERATOR_CONF_KEY. Later in 2.0, the deprecated will be deleted.

2. Readability: there is no need for indentation here.
{code}
+  The column names of the TSV data must be specified using the option:\n 
+
+-D + COLUMNS_CONF_KEY +  option. This option takes the form 
of  +
+  comma-separated column names, where each column name is either 
+
{code}

Please align like this
{code}
+  The column names of the TSV data must be specified using the option:\n 
+
+-D + COLUMNS_CONF_KEY +  option. This option takes the form of  +
+  comma-separated column names, where each column name is either +
{code}
here and everywhere else.

 Fix ImportTsv spelling and usage issues
 ---

 Key: HBASE-13981
 URL: https://issues.apache.org/jira/browse/HBASE-13981
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 1.1.0.1
Reporter: Lars George
Assignee: Gabor Liptak
  Labels: beginner
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13981.1.patch, HBASE-13981.2.patch, 
 HBASE-13981.3.patch


 The {{ImportTsv}} tool has various spelling and formatting issues. Fix those.
 In code:
 {noformat}
   public final static String ATTRIBUTE_SEPERATOR_CONF_KEY = 
 attributes.seperator;
 {noformat}
 It is separator.
 In usage text:
 {noformat}
 input data. Another special columnHBASE_TS_KEY designates that this column 
 should be
 {noformat}
 Space missing.
 {noformat}
 Record with invalid timestamps (blank, non-numeric) will be treated as bad 
 record.
 {noformat}
 Records ... as bad records - plural missing twice.
 {noformat}
 HBASE_ATTRIBUTES_KEY can be used to specify Operation Attributes per record.
  Should be specified as key=value where -1 is used 
  as the seperator.  Note that more than one OperationAttributes can be 
 specified.
 {noformat}
 - Remove line wraps and indentation. 
 - Fix separator.
 - Fix wrong separator being output, it is not -1 (wrong constant use in 
 code)
 - General wording/style could be better (eg. last sentence now uses 
 OperationAttributes without a space).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615964#comment-14615964
 ] 

Hudson commented on HBASE-14017:


SUCCESS: Integrated in HBase-1.2 #54 (See 
[https://builds.apache.org/job/HBase-1.2/54/])
HBASE-14017 Procedure v2 - MasterProcedureQueue fix concurrency issue on table 
queue deletion (busbey: rev 8e65b9f86d63b61177170658f0e1a86ef7b2d51f)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureQueue.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureQueue.java


 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics


[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615800#comment-14615800
 ] 

Hadoop QA commented on HBASE-13965:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12743774/HBASE-13965-v4.patch
  against master branch at commit 608c3aa15c34b9014f99e857b374645db58cbbe3.
  ATTACHMENT ID: 12743774

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14680//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14680//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14680//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14680//console

This message is automatically generated.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-06 Thread Ben Lau (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615813#comment-14615813
]

Ben Lau commented on HBASE-13991:
-

Hi guys, hope you had a happy 4th of July.

We would like to do something akin to Lars’ last idea. That is, we will have
code to support both the old layout and the new layout, but it will be on a per
HBase cluster basis. You will be able to migrate a cluster entirely to the
hierarchical layout or leave it on the old layout.

This approach has the following pros:
- If HBase users do not need/want the new layout, they will not have to do an
offline upgrade in order to use new HBase code. The alternative is to make an
online upgrade for the hierarchical layout, but this would require some very
messy changes to the codebase and also be tricky to test fully.
- HBase code will not have to ‘detect’ whether tables/paths/regions are
hierarchical or not. The master or region server can simply look at the root
table at startup and use that to determine if the cluster has migrated to the
hierarchical layout. This single source of truth would make code less ugly
since you don’t need to do in-context per-region/path checks in different parts
of the codebase.

What do you guys think about this approach?

Hierarchical Layout for Humongous Tables

Key: HBASE-13991
URL: https://issues.apache.org/jira/browse/HBASE-13991
Project: HBase
Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf

Add support for humongous tables via a hierarchical layout for regions on
filesystem.
Credit for most of this code goes to Huaiyu Zhu.
Latest version of the patch is available on the review board:
https://reviews.apache.org/r/36029/

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14028) DistributedLogReplay drops edits when ITBLL 125M


[ 
https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615868#comment-14615868
 ] 

Vladimir Rodionov commented on HBASE-14028:
---

This -recovery-from-failure-during-recovery-from-failure thing looks quite 
complicated to me. I am working on HBASE-7912 and one of the improvements which 
is on the list is WALPlayer into HFiles followed by a bulk load. Pounding HBase 
with millions of puts is not the right approach.

 DistributedLogReplay drops edits when ITBLL 125M
 

 Key: HBASE-14028
 URL: https://issues.apache.org/jira/browse/HBASE-14028
 Project: HBase
  Issue Type: Bug
  Components: Recovery
Affects Versions: 1.2.0
Reporter: stack

 Testing DLR before 1.2.0RC gets cut, we are dropping edits.
 Issue seems to be around replay into a deployed region that is on a server 
 that dies before all edits have finished replaying. Logging is sparse on 
 sequenceid accounting so can't tell for sure how it is happening (and if our 
 now accounting by Store is messing up DLR). Digging.
 I notice also that DLR does not refresh its cache of region location on error 
 -- it just keeps trying till whole WAL fails 8 retries...about 30 
 seconds. We could do a bit of refactor and have the replay find region in new 
 location if moved during DLR replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13890) Get/Scan from MemStore only (Client API)


 [ 
https://issues.apache.org/jira/browse/HBASE-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-13890:
--
Attachment: HBASE-13890-v1.patch

First cut.

 Get/Scan from MemStore only (Client API)
 

 Key: HBASE-13890
 URL: https://issues.apache.org/jira/browse/HBASE-13890
 Project: HBase
  Issue Type: New Feature
  Components: API, Client, Scanners
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
 Attachments: HBASE-13890-v1.patch


 This is short-circuit read for get/scan when recent data (version) of a cell 
 can be found only in MemStore (with very high probability). 
 Good examples are: Atomic counters and appends. This feature will allow to 
 bypass completely store file scanners and improve performance and latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13890) Get/Scan from MemStore only (Client API)


 [ 
https://issues.apache.org/jira/browse/HBASE-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-13890:
--
Status: Patch Available  (was: Open)

 Get/Scan from MemStore only (Client API)
 

 Key: HBASE-13890
 URL: https://issues.apache.org/jira/browse/HBASE-13890
 Project: HBase
  Issue Type: New Feature
  Components: API, Client, Scanners
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
 Attachments: HBASE-13890-v1.patch


 This is short-circuit read for get/scan when recent data (version) of a cell 
 can be found only in MemStore (with very high probability). 
 Good examples are: Atomic counters and appends. This feature will allow to 
 bypass completely store file scanners and improve performance and latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13890) Get/Scan from MemStore only (Client API)


[ 
https://issues.apache.org/jira/browse/HBASE-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615894#comment-14615894
 ] 

Vladimir Rodionov commented on HBASE-13890:
---

The new API was added to *OperationsWithAttributes*: 

{code}
  /**
   * This method allows you to set in-memory-only operation mode.
   * Queries: Get and Scan as well as Increment and Append will
   * work only on data in RAM (MemStore). 
   * If data is missing in RAM, the following will be returned:
   *  Empty Result for Get, Increment, Append and Scan
   *  
   */
  public OperationWithAttributes setMemstoreOnly(boolean v){
setAttribute(MEMSTORE_ONLY_ATTRIBUTE, Bytes.toBytes(v));
return this;
  }
  /**
   * Checks if we are in-memory-only mode
   * @return true, if yes
   */
  public boolean isMemstoreOnly(){
byte[] attr = getAttribute(MEMSTORE_ONLY_ATTRIBUTE);
return attr != null  Bytes.toBoolean(attr) == true;
  }
{code}

 Get/Scan from MemStore only (Client API)
 

 Key: HBASE-13890
 URL: https://issues.apache.org/jira/browse/HBASE-13890
 Project: HBase
  Issue Type: New Feature
  Components: API, Client, Scanners
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
 Attachments: HBASE-13890-v1.patch


 This is short-circuit read for get/scan when recent data (version) of a cell 
 can be found only in MemStore (with very high probability). 
 Good examples are: Atomic counters and appends. This feature will allow to 
 bypass completely store file scanners and improve performance and latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14025) Update CHANGES.txt for 1.2

[
https://issues.apache.org/jira/browse/HBASE-14025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615908#comment-14615908
]

Enis Soztutar commented on HBASE-14025:
---

i think the discrepancy is due to the fact that we mark all active branches
that the fix went in, for example, today we are marking an issue with 1.2.0,
1.3.0 and 2.0.0 if it was committed to branches branch-1.2, branch-1 and
master. By the time, 1.3.0 release comes up, the jiras with fixVersion = 1.3.0
will include 1.3 exclusive patches + patches in 1.2.0 and earlier.

[~busbey] I have noticed that you have been unmarking some of the fixVersions
in jira (for example HBASE-13895). Is this for clean up for CHANGES.txt for
1.3.0 or some general cleanup? I am asking to understand whether we need to
refine the process.

At the time of the HBASE-13895 commit, there was already branch-1, branch-1.1,
branch-1.2 and master branches. Thus, the jira used to bear {{2.0.0, 1.2.0,
1.1.2, 1.3.0}}. Your process above (correct me if I am wrong) removes 1.3.0
from fixVersions since 1.2.0 is the first release that will have that patch?
How do we differentiate the fact that the issue has actually been committed
after the branch-1.2 is forked, and committed to both branch-1.2 and branch-1?

Update CHANGES.txt for 1.2
--

Key: HBASE-14025
URL: https://issues.apache.org/jira/browse/HBASE-14025
Project: HBase
Issue Type: Sub-task
Components: documentation
Affects Versions: 1.2.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Fix For: 1.2.0

Since it's more effort than I expected, making a ticket to track actually
updating CHANGES.txt so that new RMs have an idea what to expect.
Maybe will make doc changes if there's enough here.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13352) Add hbase.import.version to Import usage.


[ 
https://issues.apache.org/jira/browse/HBASE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615929#comment-14615929
 ] 

Hudson commented on HBASE-13352:


FAILURE: Integrated in HBase-TRUNK #6633 (See 
[https://builds.apache.org/job/HBase-TRUNK/6633/])
HBASE-13352 Add hbase.import.version to Import usage (Lars Hofhansl) (enis: rev 
c220635c7893c96db675cb2b80af6ade4a44e3d4)
* hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java


 Add hbase.import.version to Import usage.
 -

 Key: HBASE-13352
 URL: https://issues.apache.org/jira/browse/HBASE-13352
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13352-v2.txt, 13352.txt, hbase-13352_v3.patch


 We just tried to export some (small amount of) data out of an 0.94 cluster to 
 0.98 cluster. We used Export/Import for that.
 By default we found that the import M/R job correctly reports the number of 
 records seen, but _silently_ does not import anything. After looking at the 
 0.98 it's obvious there's an hbase.import.version 
 (-Dhbase.import.version=0.94) to make this work.
 Two issues:
 # -Dhbase.import.version=0.94 should be show with the the Import.usage
 # If not given it should not just silently not import anything
 In this issue I'll just a trivially add this option to the Import tool's 
 usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13988) Add exception handler for lease thread

2015-07-06 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615944#comment-14615944
 ] 

stack commented on HBASE-13988:
---

No other comments. Just a suggestion. I like the way the patch makes use of the 
existing exception handling mechanism.

 Add exception handler for lease thread
 --

 Key: HBASE-13988
 URL: https://issues.apache.org/jira/browse/HBASE-13988
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Fix For: 2.0.0, 1.0.2, 1.1.2, 0.98.15

 Attachments: HBASE-13988-v001.diff


 In a prod cluster, a region server exited for some important 
 threads were not alive. After excluding other threads from the log, we 
 doubted the lease thread was the root. 
 So we need to add an exception handler to the lease thread to debug why it 
 exited in future.
  
 {quote}
 2015-06-29,12:46:09,222 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more 
 threads are no longer alive -- stop
 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
 server on 21600
 ...
 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: 
 LogRoller exiting.
 2015-06-29,12:46:09,330 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting
 2015-06-29,12:46:09,330 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: 
 regionserver21600.compactionChecker exiting
 2015-06-29,12:46:12,403 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: 
 regionserver21600.periodicFlusher exiting
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14028) DistributedLogReplay drops edits when ITBLL 125M

2015-07-06 Thread stack (JIRA)

stack created HBASE-14028:
-

 Summary: DistributedLogReplay drops edits when ITBLL 125M
 Key: HBASE-14028
 URL: https://issues.apache.org/jira/browse/HBASE-14028
 Project: HBase
  Issue Type: Bug
  Components: Recovery
Affects Versions: 1.2.0
Reporter: stack


Testing DLR before 1.2.0RC gets cut, we are dropping edits.

Issue seems to be around replay into a deployed region that is on a server that 
dies before all edits have finished replaying. Logging is sparse on sequenceid 
accounting so can't tell for sure how it is happening (and if our now 
accounting by Store is messing up DLR). Digging.

I notice also that DLR does not refresh its cache of region location on error 
-- it just keeps trying till whole WAL fails 8 retries...about 30 seconds. 
We could do a bit of refactor and have the replay find region in new location 
if moved during DLR replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13387) Add ByteBufferedCell an extension to Cell

2015-07-06 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615843#comment-14615843
]

stack commented on HBASE-13387:
---

Should ByteBufferedCell be in hbase-server module? Can it be in the
RegionServer package? (It doesn't look like it given we look for it in
comparators. I suppose also BBCell doesn't have to have anything to do with
Server when in common module. Someone else might want to use it for some other
purpose?

Patch LGTM but how about tests of the new methods added especially when backed
by a BBCell?

Nice work.

Add ByteBufferedCell an extension to Cell
-

Attachments: ByteBufferedCell.docx, HBASE-13387_v1.patch,
WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch, benchmark.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-13352) Add hbase.import.version to Import usage.


 [ 
https://issues.apache.org/jira/browse/HBASE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar resolved HBASE-13352.
---
   Resolution: Fixed
 Assignee: Lars Hofhansl
 Hadoop Flags: Reviewed
Fix Version/s: (was: 1.2.1)
   1.2.0

Committed this to 0.98+. Thanks Lars. 

 Add hbase.import.version to Import usage.
 -

 Key: HBASE-13352
 URL: https://issues.apache.org/jira/browse/HBASE-13352
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13352-v2.txt, 13352.txt, hbase-13352_v3.patch


 We just tried to export some (small amount of) data out of an 0.94 cluster to 
 0.98 cluster. We used Export/Import for that.
 By default we found that the import M/R job correctly reports the number of 
 records seen, but _silently_ does not import anything. After looking at the 
 0.98 it's obvious there's an hbase.import.version 
 (-Dhbase.import.version=0.94) to make this work.
 Two issues:
 # -Dhbase.import.version=0.94 should be show with the the Import.usage
 # If not given it should not just silently not import anything
 In this issue I'll just a trivially add this option to the Import tool's 
 usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13988) Add exception handler for lease thread


[ 
https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615855#comment-14615855
 ] 

Enis Soztutar commented on HBASE-13988:
---

[~saint@gmail.com] any further comments? Otherwise, I'll commit this. 

 Add exception handler for lease thread
 --

 Key: HBASE-13988
 URL: https://issues.apache.org/jira/browse/HBASE-13988
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Fix For: 2.0.0, 1.0.2, 1.1.2, 0.98.15

 Attachments: HBASE-13988-v001.diff


 In a prod cluster, a region server exited for some important 
 threads were not alive. After excluding other threads from the log, we 
 doubted the lease thread was the root. 
 So we need to add an exception handler to the lease thread to debug why it 
 exited in future.
  
 {quote}
 2015-06-29,12:46:09,222 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more 
 threads are no longer alive -- stop
 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
 server on 21600
 ...
 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: 
 LogRoller exiting.
 2015-06-29,12:46:09,330 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting
 2015-06-29,12:46:09,330 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: 
 regionserver21600.compactionChecker exiting
 2015-06-29,12:46:12,403 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: 
 regionserver21600.periodicFlusher exiting
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low


[ 
https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615871#comment-14615871
 ] 

Enis Soztutar commented on HBASE-13832:
---

bq. org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS
This is the new test from the patch. Seems related. 
bq. before with the while (isRunning()) we were spinning after the signal, to 
make clear that there were no other run of the syncLoop(). in this case we may 
do another round of the loop and execute stuff which in theory is not what you 
expect after sending the abort signal.
I guess we can rethrow the exception after here: 
{code}
+} catch (Throwable t) {
+  syncException.compareAndSet(null, t);
{code}

 Procedure V2: master fail to start due to WALProcedureStore sync failures 
 when HDFS data nodes count is low
 ---

 Key: HBASE-13832
 URL: https://issues.apache.org/jira/browse/HBASE-13832
 Project: HBase
  Issue Type: Sub-task
  Components: master, proc-v2
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Stephen Yuan Jiang
Assignee: Matteo Bertozzi
Priority: Critical
 Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1

 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, 
 HBASE-13832-v2.patch, HBASE-13832-v4.patch, HBASE-13832-v5.patch, 
 HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch


 when the data node  3, we got failure in WALProcedureStore#syncLoop() during 
 master start.  The failure prevents master to get started.  
 {noformat}
 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] 
 wal.WALProcedureStore: Sync slot failed, abort.
 java.io.IOException: Failed to replace a bad datanode on the existing 
 pipeline due to no more good datanodes being available to try. (Nodes: 
 current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  
 DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]],
  
 original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK],
  DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-
 490ece56c772,DISK]]). The current failed datanode replacement policy is 
 DEFAULT, and a client may configure this via 
 'dfs.client.block.write.replace-datanode-on-failure.policy'  in its 
 configuration.
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951)
 {noformat}
 One proposal is to implement some similar logic as FSHLog: if IOException is 
 thrown during syncLoop in WALProcedureStore#start(), instead of immediate 
 abort, we could try to roll the log and see whether this resolve the issue; 
 if the new log cannot be created or more exception from rolling the log, we 
 then abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13890) Get/Scan from MemStore only (Client API)


[ 
https://issues.apache.org/jira/browse/HBASE-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615905#comment-14615905
 ] 

Vladimir Rodionov commented on HBASE-13890:
---

The new API contract is the following:

* No guarantee on a data (if it is not in MemStore - nothing you get)
* Data can be partial
* Always check a result of an operation - call regular operation if result is 
empty or partial. 

 Get/Scan from MemStore only (Client API)
 

 Key: HBASE-13890
 URL: https://issues.apache.org/jira/browse/HBASE-13890
 Project: HBase
  Issue Type: New Feature
  Components: API, Client, Scanners
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
 Attachments: HBASE-13890-v1.patch


 This is short-circuit read for get/scan when recent data (version) of a cell 
 can be found only in MemStore (with very high probability). 
 Good examples are: Atomic counters and appends. This feature will allow to 
 bypass completely store file scanners and improve performance and latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics


[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615909#comment-14615909
 ] 

Lei Chen commented on HBASE-13965:
--

Good point. It can be more memory efficient if description is stored only once 
for each cost function. Patch will be updated.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13352) Add hbase.import.version to Import usage.


[ 
https://issues.apache.org/jira/browse/HBASE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615912#comment-14615912
 ] 

Hudson commented on HBASE-13352:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1002 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1002/])
HBASE-13352 Add hbase.import.version to Import usage (Lars Hofhansl) (enis: rev 
deba1d0a1075c62647eb63be09b87b20329fe44a)
* hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java


 Add hbase.import.version to Import usage.
 -

 Key: HBASE-13352
 URL: https://issues.apache.org/jira/browse/HBASE-13352
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: 13352-v2.txt, 13352.txt, hbase-13352_v3.patch


 We just tried to export some (small amount of) data out of an 0.94 cluster to 
 0.98 cluster. We used Export/Import for that.
 By default we found that the import M/R job correctly reports the number of 
 records seen, but _silently_ does not import anything. After looking at the 
 0.98 it's obvious there's an hbase.import.version 
 (-Dhbase.import.version=0.94) to make this work.
 Two issues:
 # -Dhbase.import.version=0.94 should be show with the the Import.usage
 # If not given it should not just silently not import anything
 In this issue I'll just a trivially add this option to the Import tool's 
 usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics


 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v5.patch

Updates:
1. One copy of description is saved for each cost function, in a separate map

TODO:
1. Make hard-coded map size configurable?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion


[ 
https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615734#comment-14615734
 ] 

Hudson commented on HBASE-14017:


FAILURE: Integrated in HBase-TRUNK #6632 (See 
[https://builds.apache.org/job/HBase-TRUNK/6632/])
HBASE-14017 Procedure v2 - MasterProcedureQueue fix concurrency issue on table 
queue deletion (busbey: rev 1713f1fcaf9d721a97bc564faaf070f2e6b0b1d1)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestMasterProcedureQueue.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureQueue.java


 Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue 
 deletion
 -

 Key: HBASE-14017
 URL: https://issues.apache.org/jira/browse/HBASE-14017
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.2

 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, 
 HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, 
 HBASE-14017.v1-branch1.1.patch


 [~syuanjiang] found a concurrecy issue in the procedure queue delete where we 
 don't have an exclusive lock before deleting the table
 {noformat}
 Thread 1: Create table is running - the queue is empty and wlock is false 
 Thread 2: markTableAsDeleted see the queue empty and wlock= false
 Thread 1: tryWrite() set wlock=true; too late
 Thread 2: delete the queue
 Thread 1: never able to release the lock - NPE when trying to get the queue
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14027) Clean up netty dependencies