[jira] [Updated] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.

2012-04-30 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5712:
--

Issue Type: Improvement  (was: Sub-task)
Parent: (was: HBASE-5628)

 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
 ---

 Key: HBASE-5712
 URL: https://issues.apache.org/jira/browse/HBASE-5712
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5712-90.patch, hbase-5712.patch


 On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off 
 for 60s before attempting to read data from another datanode.  Portions of 
 the information gathered from hdfs (.regioninfo files) are loaded serially.  
 With HBase with clusters with 100's, or 1000's, or 1's regions 
 encountering these 60s delay blocks progress and can be very painful.  
 There is already some parallelization of portions of the hdfs information 
 load operations and the goal here is move the reading of .regioninfos into 
 the parallelized sections..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.

2012-04-30 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5712:
--

   Resolution: Fixed
Fix Version/s: 0.96.0
   0.94.0
   0.92.2
   0.90.7
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 0.90/0.92/0.94/0.96-trunk

 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
 ---

 Key: HBASE-5712
 URL: https://issues.apache.org/jira/browse/HBASE-5712
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: hbase-5712-90.patch, hbase-5712.patch


 On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off 
 for 60s before attempting to read data from another datanode.  Portions of 
 the information gathered from hdfs (.regioninfo files) are loaded serially.  
 With HBase with clusters with 100's, or 1000's, or 1's regions 
 encountering these 60s delay blocks progress and can be very painful.  
 There is already some parallelization of portions of the hdfs information 
 load operations and the goal here is move the reading of .regioninfos into 
 the parallelized sections..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.

2012-04-30 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5712:
--

Attachment: hbase-5712-90-v2.patch
hbase-5712-v2.patch

v2 is the versions I committed.

 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
 ---

 Key: HBASE-5712
 URL: https://issues.apache.org/jira/browse/HBASE-5712
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, 
 hbase-5712-v2.patch, hbase-5712.patch


 On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off 
 for 60s before attempting to read data from another datanode.  Portions of 
 the information gathered from hdfs (.regioninfo files) are loaded serially.  
 With HBase with clusters with 100's, or 1000's, or 1's regions 
 encountering these 60s delay blocks progress and can be very painful.  
 There is already some parallelization of portions of the hdfs information 
 load operations and the goal here is move the reading of .regioninfos into 
 the parallelized sections..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.

2012-04-30 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264691#comment-13264691
 ] 

jirapos...@reviews.apache.org commented on HBASE-5712:
--



bq.  On 2012-04-27 23:27:20, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 204
bq.   https://reviews.apache.org/r/4883/diff/1/?file=104442#file104442line204
bq.  
bq.   This'll work but why not ConcurrentSkipListMap?

Sure, changed.


bq.  On 2012-04-27 23:27:20, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 418
bq.   https://reviews.apache.org/r/4883/diff/1/?file=104442#file104442line418
bq.  
bq.   +1 on suggested change

done.


bq.  On 2012-04-27 23:27:20, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2835
bq.   
https://reviews.apache.org/r/4883/diff/1/?file=104442#file104442line2835
bq.  
bq.   Is this flag needed?  Why not just check thread is alive?  I see we 
can return with an error.  What happens if the return on 2816 happens?  Will 
the wait at #643 above be for ever?

This is not a thread but actually fed to an executor (thread pool) at line 637. 
 If the return happens on 2816, this is in a finally which will always mark the 
workitem as done. 

There are two other instances of this pattern that were originally in this code 
before I got to it -- I'd have used Futures (and have filed a follow on issue 
for it) but it works.


- jmhsieh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4883/#review7337
---


On 2012-04-26 01:42:01, jmhsieh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4883/
bq.  ---
bq.  
bq.  (Updated 2012-04-26 01:42:01)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Jimmy Xiang.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  * Parallelized load of .regioninfo files
bq.  * changed TreeMap to SortedMap in method signatures
bq.  * renamed a test's name.
bq.  
bq.  
bq.  This addresses bug HBASE-5712.
bq.  https://issues.apache.org/jira/browse/HBASE-5712
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 66156c2 
bq.src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 6b64f10 
bq.  
bq.  Diff: https://reviews.apache.org/r/4883/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran patch 10x on trunk, passes.  Ran 1x on 0.92 and 0.94.
bq.  
bq.  Ther 0.90 version that is nearly identical except for ignoring changes 
near lines HBaseFsck lines 671-680.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  jmhsieh
bq.  
bq.



 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
 ---

 Key: HBASE-5712
 URL: https://issues.apache.org/jira/browse/HBASE-5712
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, 
 hbase-5712-v2.patch, hbase-5712.patch


 On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off 
 for 60s before attempting to read data from another datanode.  Portions of 
 the information gathered from hdfs (.regioninfo files) are loaded serially.  
 With HBase with clusters with 100's, or 1000's, or 1's regions 
 encountering these 60s delay blocks progress and can be very painful.  
 There is already some parallelization of portions of the hdfs information 
 load operations and the goal here is move the reading of .regioninfos into 
 the parallelized sections..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.

2012-04-30 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-5900:
---

 Summary: HRegion#FIXED_OVERHEAD is miscalculated in 94.
 Key: HBASE-5900
 URL: https://issues.apache.org/jira/browse/HBASE-5900
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.1


After apply the patch of HBASE-5611, and tested on a 32-bit machine. This 
problem was triggered.
Before this patch, TestHeapSize is passed by pure coincidence in 94.
{noformat}
  public static final long FIXED_OVERHEAD = ClassSize.align(
  ClassSize.OBJECT +
  ClassSize.ARRAY +
  30  * ClassSize.REFERENCE + Bytes.SIZEOF_INT +
  (6 * Bytes.SIZEOF_LONG) +
  Bytes.SIZEOF_BOOLEAN);
{noformat}

Actually, there are 31 REFERENCEs and 5 LONGs in HRegion.







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264693#comment-13264693
 ] 

Jieshan Bean commented on HBASE-5900:
-

Patch will be uploaded after full tests today.

 HRegion#FIXED_OVERHEAD is miscalculated in 94.
 --

 Key: HBASE-5900
 URL: https://issues.apache.org/jira/browse/HBASE-5900
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.1


 After apply the patch of HBASE-5611, and tested on a 32-bit machine. This 
 problem was triggered.
 Before this patch, TestHeapSize is passed by pure coincidence in 94.
 {noformat}
   public static final long FIXED_OVERHEAD = ClassSize.align(
   ClassSize.OBJECT +
   ClassSize.ARRAY +
   30  * ClassSize.REFERENCE + Bytes.SIZEOF_INT +
   (6 * Bytes.SIZEOF_LONG) +
   Bytes.SIZEOF_BOOLEAN);
 {noformat}
 Actually, there are 31 REFERENCEs and 5 LONGs in HRegion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests

2012-04-30 Thread Todd Lipcon (JIRA)
Todd Lipcon created HBASE-5901:
--

 Summary: Use union type protobufs instead of class/byte pairs for 
multi requests
 Key: HBASE-5901
 URL: https://issues.apache.org/jira/browse/HBASE-5901
 Project: HBase
  Issue Type: Improvement
  Components: ipc, performance
Affects Versions: 0.96.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The current implementation of multi actions uses repeated NameBytesPairs for 
the contents of multi actions. Instead, we should introduce a union type 
protobuf for the valid actions. This makes the RPCs smaller since they don't 
need to carry class names, and makes deserialization faster since it can avoid 
some copying and reflection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests

2012-04-30 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-5901:
---

Attachment: hbase-5901.txt

This patch dropped cumulative CPU usage by about 10% for a million-record 
insert.

 Use union type protobufs instead of class/byte pairs for multi requests
 ---

 Key: HBASE-5901
 URL: https://issues.apache.org/jira/browse/HBASE-5901
 Project: HBase
  Issue Type: Improvement
  Components: ipc, performance
Affects Versions: 0.96.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hbase-5901.txt


 The current implementation of multi actions uses repeated NameBytesPairs 
 for the contents of multi actions. Instead, we should introduce a union type 
 protobuf for the valid actions. This makes the RPCs smaller since they don't 
 need to carry class names, and makes deserialization faster since it can 
 avoid some copying and reflection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests

2012-04-30 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-5901:
---

Status: Patch Available  (was: Open)

 Use union type protobufs instead of class/byte pairs for multi requests
 ---

 Key: HBASE-5901
 URL: https://issues.apache.org/jira/browse/HBASE-5901
 Project: HBase
  Issue Type: Improvement
  Components: ipc, performance
Affects Versions: 0.96.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hbase-5901.txt


 The current implementation of multi actions uses repeated NameBytesPairs 
 for the contents of multi actions. Instead, we should introduce a union type 
 protobuf for the valid actions. This makes the RPCs smaller since they don't 
 need to carry class names, and makes deserialization faster since it can 
 avoid some copying and reflection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests

2012-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264710#comment-13264710
 ] 

Hadoop QA commented on HBASE-5901:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525044/hbase-5901.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1687//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1687//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1687//console

This message is automatically generated.

 Use union type protobufs instead of class/byte pairs for multi requests
 ---

 Key: HBASE-5901
 URL: https://issues.apache.org/jira/browse/HBASE-5901
 Project: HBase
  Issue Type: Improvement
  Components: ipc, performance
Affects Versions: 0.96.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hbase-5901.txt


 The current implementation of multi actions uses repeated NameBytesPairs 
 for the contents of multi actions. Instead, we should introduce a union type 
 protobuf for the valid actions. This makes the RPCs smaller since they don't 
 need to carry class names, and makes deserialization faster since it can 
 avoid some copying and reflection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264714#comment-13264714
 ] 

Hudson commented on HBASE-5712:
---

Integrated in HBase-TRUNK #2825 (See 
[https://builds.apache.org/job/HBase-TRUNK/2825/])
HBASE-5712 Parallelize load of .regioninfo files in diagnostic/repair 
portion of hbck (Revision 1332072)

 Result = SUCCESS
jmhsieh : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
 ---

 Key: HBASE-5712
 URL: https://issues.apache.org/jira/browse/HBASE-5712
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, 
 hbase-5712-v2.patch, hbase-5712.patch


 On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off 
 for 60s before attempting to read data from another datanode.  Portions of 
 the information gathered from hdfs (.regioninfo files) are loaded serially.  
 With HBase with clusters with 100's, or 1000's, or 1's regions 
 encountering these 60s delay blocks progress and can be very painful.  
 There is already some parallelization of portions of the hdfs information 
 load operations and the goal here is move the reading of .regioninfos into 
 the parallelized sections..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264717#comment-13264717
 ] 

Hudson commented on HBASE-5712:
---

Integrated in HBase-0.94 #160 (See 
[https://builds.apache.org/job/HBase-0.94/160/])
HBASE-5712 Parallelize load of .regioninfo files in diagnostic/repair 
portion of hbck (Revision 1332071)

 Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
 ---

 Key: HBASE-5712
 URL: https://issues.apache.org/jira/browse/HBASE-5712
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, 
 hbase-5712-v2.patch, hbase-5712.patch


 On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off 
 for 60s before attempting to read data from another datanode.  Portions of 
 the information gathered from hdfs (.regioninfo files) are loaded serially.  
 With HBase with clusters with 100's, or 1000's, or 1's regions 
 encountering these 60s delay blocks progress and can be very painful.  
 There is already some parallelization of portions of the hdfs information 
 load operations and the goal here is move the reading of .regioninfos into 
 the parallelized sections..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage

2012-04-30 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264735#comment-13264735
 ] 

Anoop Sam John commented on HBASE-5897:
---

As per the simple patch also, there can be more CP calls happening for one Put
{code}
-  for (int i = 0; i  batchOp.operations.length; i++) {
+  for (int i = batchOp.nextIndexToProcess; i  batchOp.operations.length; 
i++) {
{code}
Suppose in 2 calls to doMiniBatchPut() a put(List) with 100 puts operation is 
getting completed. For the 1st run it will call hook for all 100 Puts. Then in 
the next run previously it was calling again 100 times. Now it will be for all 
the remaining puts which were not handled in the 1st iteration. 

In Todd's patch this will not happen.[Calling all pre hook just before the 1st 
call to the doMiniBatchPut()] But that will call the pre hook much before the 
actual put operation. Is this correct? How some one can for sure get a pre hook 
call before the actual put() for a Put?

 prePut coprocessor hook causing substantial CPU usage
 -

 Key: HBASE-5897
 URL: https://issues.apache.org/jira/browse/HBASE-5897
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5897-simple.txt, hbase-5897.txt


 I was running an insert workload against trunk under oprofile and saw that a 
 significant portion of CPU usage was going to calling the prePut 
 coprocessor hook inside doMiniBatchPut, even though I don't have any 
 coprocessors installed. I ran a million-row insert and collected CPU time 
 spent in the RS after commenting out the preput hook, and found CPU usage 
 reduced by 33%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264740#comment-13264740
 ] 

Jieshan Bean commented on HBASE-5611:
-

TestHeapSize failure on a 32-bit machine in 94 is caused by HBASE-5900. 

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264746#comment-13264746
 ] 

Hudson commented on HBASE-5712:
---

Integrated in HBase-0.92 #393 (See 
[https://builds.apache.org/job/HBase-0.92/393/])
HBASE-5712 Parallelize load of .regioninfo files in diagnostic/repair 
portion of hbck (Revision 1332070)

 Result = FAILURE
jmhsieh : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
 ---

 Key: HBASE-5712
 URL: https://issues.apache.org/jira/browse/HBASE-5712
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, 
 hbase-5712-v2.patch, hbase-5712.patch


 On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off 
 for 60s before attempting to read data from another datanode.  Portions of 
 the information gathered from hdfs (.regioninfo files) are loaded serially.  
 With HBase with clusters with 100's, or 1000's, or 1's regions 
 encountering these 60s delay blocks progress and can be very painful.  
 There is already some parallelization of portions of the hdfs information 
 load operations and the goal here is move the reading of .regioninfos into 
 the parallelized sections..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.

2012-04-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-5900:


Attachment: HRegion-FIEED_OVERHEAD.patch

 HRegion#FIXED_OVERHEAD is miscalculated in 94.
 --

 Key: HBASE-5900
 URL: https://issues.apache.org/jira/browse/HBASE-5900
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.1

 Attachments: HRegion-FIEED_OVERHEAD.patch


 After apply the patch of HBASE-5611, and tested on a 32-bit machine. This 
 problem was triggered.
 Before this patch, TestHeapSize is passed by pure coincidence in 94.
 {noformat}
   public static final long FIXED_OVERHEAD = ClassSize.align(
   ClassSize.OBJECT +
   ClassSize.ARRAY +
   30  * ClassSize.REFERENCE + Bytes.SIZEOF_INT +
   (6 * Bytes.SIZEOF_LONG) +
   Bytes.SIZEOF_BOOLEAN);
 {noformat}
 Actually, there are 31 REFERENCEs and 5 LONGs in HRegion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264859#comment-13264859
 ] 

Jieshan Bean commented on HBASE-5611:
-

The new version patch for 94 will be uploaded after HBASE-5900 get fixed.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.

2012-04-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-5900:


Status: Patch Available  (was: Open)

 HRegion#FIXED_OVERHEAD is miscalculated in 94.
 --

 Key: HBASE-5900
 URL: https://issues.apache.org/jira/browse/HBASE-5900
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.1

 Attachments: HRegion-FIEED_OVERHEAD.patch


 After apply the patch of HBASE-5611, and tested on a 32-bit machine. This 
 problem was triggered.
 Before this patch, TestHeapSize is passed by pure coincidence in 94.
 {noformat}
   public static final long FIXED_OVERHEAD = ClassSize.align(
   ClassSize.OBJECT +
   ClassSize.ARRAY +
   30  * ClassSize.REFERENCE + Bytes.SIZEOF_INT +
   (6 * Bytes.SIZEOF_LONG) +
   Bytes.SIZEOF_BOOLEAN);
 {noformat}
 Actually, there are 31 REFERENCEs and 5 LONGs in HRegion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.

2012-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264861#comment-13264861
 ] 

Hadoop QA commented on HBASE-5900:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12525053/HRegion-FIEED_OVERHEAD.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1688//console

This message is automatically generated.

 HRegion#FIXED_OVERHEAD is miscalculated in 94.
 --

 Key: HBASE-5900
 URL: https://issues.apache.org/jira/browse/HBASE-5900
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.1

 Attachments: HRegion-FIEED_OVERHEAD.patch


 After apply the patch of HBASE-5611, and tested on a 32-bit machine. This 
 problem was triggered.
 Before this patch, TestHeapSize is passed by pure coincidence in 94.
 {noformat}
   public static final long FIXED_OVERHEAD = ClassSize.align(
   ClassSize.OBJECT +
   ClassSize.ARRAY +
   30  * ClassSize.REFERENCE + Bytes.SIZEOF_INT +
   (6 * Bytes.SIZEOF_LONG) +
   Bytes.SIZEOF_BOOLEAN);
 {noformat}
 Actually, there are 31 REFERENCEs and 5 LONGs in HRegion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5902) Some scripts are not executable

2012-04-30 Thread nkeywal (JIRA)
nkeywal created HBASE-5902:
--

 Summary: Some scripts are not executable
 Key: HBASE-5902
 URL: https://issues.apache.org/jira/browse/HBASE-5902
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Trivial


-rw-rw-r--  graceful_stop.sh
-rw-rw-r--  hbase-config.sh
-rw-rw-r--  local-master-backup.sh
-rw-rw-r--  local-regionservers.sh


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5902) Some scripts are not executable

2012-04-30 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5902:
---

Attachment: 5902.v1.patch

 Some scripts are not executable
 ---

 Key: HBASE-5902
 URL: https://issues.apache.org/jira/browse/HBASE-5902
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Trivial
 Attachments: 5902.v1.patch


 -rw-rw-r--  graceful_stop.sh
 -rw-rw-r--  hbase-config.sh
 -rw-rw-r--  local-master-backup.sh
 -rw-rw-r--  local-regionservers.sh

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264874#comment-13264874
 ] 

Jieshan Bean commented on HBASE-5875:
-

Look into the method of CatalogTracker#verifyRootRegionLocation:
{noformat}
public boolean verifyRootRegionLocation(final long timeout)
  throws InterruptedException, IOException {
AdminProtocol connection = null;
try {
  connection = waitForRootServerConnection(timeout);
} catch (NotAllMetaRegionsOnlineException e) {
  // Pass
} catch (ServerNotRunningYetException e) {
  // Pass -- remote server is not up so can't be carrying root
} catch (UnknownHostException e) {
  // Pass -- server name doesn't resolve so it can't be assigned anything.
}
return (connection == null)? false:
  verifyRegionLocation(connection,
this.rootRegionTracker.getRootRegionLocation(), ROOT_REGION_NAME);
  }
{noformat}
I'm thinking about an approach which can handle this issue according to 
different exception. 
e.g. if we got an ServerNotRunningYetException, we can process 
splitLogAndExpireIfOnline.
But if we got an NotServingRegionException, we should not do that.



 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264876#comment-13264876
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

@Jieshan
As Ted also suggested if we go by the exception then we need to add unnecessary 
retry logic, sleep time and also need to modify the api 
verifyRootRegionLocation which is used in many places.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5874) The HBase do not configure the 'fs.default.name' attribute, the hbck tool and Merge tool throw IllegalArgumentException.

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264882#comment-13264882
 ] 

Jieshan Bean commented on HBASE-5874:
-

+1 on this patch.
I think the patches for other branches are also needed.

 The HBase do not configure the 'fs.default.name' attribute, the hbck tool and 
 Merge tool throw IllegalArgumentException.
 

 Key: HBASE-5874
 URL: https://issues.apache.org/jira/browse/HBASE-5874
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.90.6
Reporter: fulin wang
Assignee: fulin wang
 Attachments: HBASE-5874-0.90.patch, HBASE-5874-trunk.patch


 The HBase do not configure the 'fs.default.name' attribute, the hbck tool and 
 Merge tool throw IllegalArgumentException.
 the hbck tool and Merge tool, we should add 'fs.default.name' attriubte to 
 the code.
 hbck exception:
 Exception in thread main java.lang.IllegalArgumentException: Wrong FS: 
 hdfs://160.176.0.101:9000/hbase/.META./1028785192/.regioninfo, expected: 
 file:///
   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:412)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:59)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:382)
   at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:285)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:128)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:301)
   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:489)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegioninfo(HBaseFsck.java:565)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:596)
   at 
 org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:332)
   at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:360)
   at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:2907)
 
 Merge exception:  
 [2012-05-05 10:48:24,830] [ERROR] [main] [org.apache.hadoop.hbase.util.Merge 
 381] exiting due to error
 java.lang.IllegalArgumentException: Wrong FS: 
 hdfs://160.176.0.101:9000/hbase/.META./1028785192/.regioninfo, expected: 
 file:///
   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:412)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:59)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:382)
   at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:285)
   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:823)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:415)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2679)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2665)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2634)
   at 
 org.apache.hadoop.hbase.util.MetaUtils.openMetaRegion(MetaUtils.java:276)
   at 
 org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:261)
   at org.apache.hadoop.hbase.util.Merge.run(Merge.java:115)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.hadoop.hbase.util.Merge.main(Merge.java:379)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5875:
--

Attachment: HBASE-5875.patch

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264900#comment-13264900
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

Patch for trunk.  TestCases passed.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5875:
--

Status: Patch Available  (was: Open)

@Chunhui
Can you take a look at this? This is in relation to HBASE-4880. Pls provide 
your thoughts

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5883) Backup master is going down due to connection refused exception

2012-04-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-5883:


Attachment: HBASE-5883-94.patch

Patch for 94. All tests passed. We are still testing it in real cluster. 
Your comments before I post the results is welcome. 
Thank you.

 Backup master is going down due to connection refused exception
 ---

 Key: HBASE-5883
 URL: https://issues.apache.org/jira/browse/HBASE-5883
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Gopinathan A
Assignee: Jieshan Bean
 Attachments: HBASE-5883-94.patch


 The active master node network was down for some time (This node contains 
 Master,DN,ZK,RS). Here backup node got 
 notification, and started to became active. Immedietly backup node got 
 aborted with the below exception.
 {noformat}
 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 finished splitting (more than or equal to) 861248320 bytes in 4 log files in 
 [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting]
  in 26374ms
 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.io.IOException: java.net.ConnectException: Connection refused
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
   at $Proxy13.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660)
   at 
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
   ... 20 more
 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: 
 Stopping service threads
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264933#comment-13264933
 ] 

Hadoop QA commented on HBASE-5875:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525060/HBASE-5875.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1689//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1689//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1689//console

This message is automatically generated.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.

2012-04-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264934#comment-13264934
 ] 

Zhihong Yu commented on HBASE-5900:
---

Please keep the original indentation so that it is easy to see the changes:
{code}
+  
+  public static final long HREGION_CLASS_SIZE = ClassSize.OBJECT
+  + ClassSize.ARRAY + 31 * ClassSize.REFERENCE + Bytes.SIZEOF_INT
+  + (5 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN;

-  public static final long FIXED_OVERHEAD = ClassSize.align(
-  ClassSize.OBJECT +
-  ClassSize.ARRAY +
-  30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT +
-  (6 * Bytes.SIZEOF_LONG) +
-  Bytes.SIZEOF_BOOLEAN);
{code}
I ran TestHeapSize with the patch and it passed.

Let's keep the patch in minimal form with the fix to FIXED_OVERHEAD only.

 HRegion#FIXED_OVERHEAD is miscalculated in 94.
 --

 Key: HBASE-5900
 URL: https://issues.apache.org/jira/browse/HBASE-5900
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.1

 Attachments: HRegion-FIEED_OVERHEAD.patch


 After apply the patch of HBASE-5611, and tested on a 32-bit machine. This 
 problem was triggered.
 Before this patch, TestHeapSize is passed by pure coincidence in 94.
 {noformat}
   public static final long FIXED_OVERHEAD = ClassSize.align(
   ClassSize.OBJECT +
   ClassSize.ARRAY +
   30  * ClassSize.REFERENCE + Bytes.SIZEOF_INT +
   (6 * Bytes.SIZEOF_LONG) +
   Bytes.SIZEOF_BOOLEAN);
 {noformat}
 Actually, there are 31 REFERENCEs and 5 LONGs in HRegion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264938#comment-13264938
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

Testcase failure seems unrelated to this fix.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264939#comment-13264939
 ] 

ramkrishna.s.vasudevan commented on HBASE-5840:
---

@Lars
You want this in 0.94? If not i will commit in trunk alone?

 Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing 
 the old status
 --

 Key: HBASE-5840
 URL: https://issues.apache.org/jira/browse/HBASE-5840
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: rajeshbabu
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5840.patch, HBASE-5840_trunk.patch, 
 HBASE-5840_v2.patch


 TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will 
 keeps showing old status.
 This will miss leads the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5806) Handle split region related failures on master restart and RS restart

2012-04-30 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264950#comment-13264950
 ] 

Chinna Rao Lalam commented on HBASE-5806:
-


for #1 above, 
RegionServer is crashed at SplitTransaction.createDaughters(Server, 
RegionServerServices) in  while removing from online regions()
{code}
if (!testing) {
  
services.removeFromOnlineRegions(this.parent.getRegionInfo().getEncodedName());
}
{code}

Here where ever the regionserver is crashed the ephemeral node will be deleted 
and master will get the notification of nodeDeleted() where it will be cleared 
from RIT

But the ServerShutdownHandler executed first than the nodeDeleted() event for 
the region node.
You can see that from the below logs

{noformat}
2012-04-06 14:35:08,841 DEBUG 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Removed 
test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8. from list of regions to 
assign because in RIT; region state: SPLITTING

2012-04-06 14:35:12,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Ephemeral node deleted, regionserver crashed?, clearing from RIT; 
rs=test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8. state=SPLITTING, 
ts=1333703059260, server=HOST-10-18-40-25,60020,1333695183392
{noformat}

In this situation the below code populated that region

{code}
  ListRegionState regionsInTransition =
this.services.getAssignmentManager().
  processServerShutdown(this.serverName);
{code}

and it is in !rit.isClosing()  !rit.isPendingClose() so the region is deleted 
from the hris

{code}
  for (RegionState rit : regionsInTransition) {
if (!rit.isClosing()  !rit.isPendingClose()) {
  LOG.debug(Removed  + rit.getRegion().getRegionNameAsString() +
   from list of regions to assign because in RIT; region state:  +
  rit.getState());
  if (hris != null) hris.remove(rit.getRegion());
}
  }
{code}
The fix in SSH addresses #1.
#2 came because of HBASE-5615.  However HBASE-5615 was reverted.
#3 comes when master restarts after sp1itting is done and before CJ has cleared 
the region from META. So while rebuilding the user region we ensure that the 
offlined parent region is not again taken into account.

#2 and #3 are together taken care in this patch such that the fix does solve 
both the problems.

 Handle split region related failures on master restart and RS restart
 -

 Key: HBASE-5806
 URL: https://issues.apache.org/jira/browse/HBASE-5806
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: Chinna Rao Lalam
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5806.patch


 This issue is raised to solve issues that comes out of partial region split 
 happened and the region node in the ZK which is in RS_ZK_REGION_SPLITTING and 
 RS_ZK_REGION_SPLIT is not yet processed.
 This also tries to address HBASE-5615.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb

2012-04-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5869:
-

Attachment: 5869v8.txt

Fixes for a few of the failing tests.

 Move SplitLogManager splitlog taskstate and AssignmentManager 
 RegionTransitionData znode datas to pb 
 -

 Key: HBASE-5869
 URL: https://issues.apache.org/jira/browse/HBASE-5869
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: 5869v7.txt, 5869v8.txt, firstcut.txt, secondcut.txt, 
 v4.txt, v5.txt, v6.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread nkeywal (JIRA)
nkeywal created HBASE-5903:
--

 Summary: Detect the test classes without categories
 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor


The tests are executed by category. When a test does not have a category, it's 
not run on prebuild nor central build.

This new test checks the test classess and list the ones without category. It 
fails if it finds one. As it's a small test it will be executed on the 
developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5903:
---

Attachment: 5903.v3.patch

 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5903:
---

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264974#comment-13264974
 ] 

Hadoop QA commented on HBASE-5903:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525071/5903.v3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestAssignmentManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1691//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1691//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1691//console

This message is automatically generated.

 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb

2012-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264975#comment-13264975
 ] 

Hadoop QA commented on HBASE-5869:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525070/5869v8.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 47 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestRollingRestart
  org.apache.hadoop.hbase.regionserver.TestHRegionOnCluster
  
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster
  org.apache.hadoop.hbase.client.TestScannerTimeout
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting
  org.apache.hadoop.hbase.TestDrainingServer
  
org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing
  org.apache.hadoop.hbase.TestFullLogReconstruction
  org.apache.hadoop.hbase.master.TestMasterFailover
  org.apache.hadoop.hbase.master.TestSplitLogManager
  org.apache.hadoop.hbase.TestZooKeeper

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1690//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1690//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1690//console

This message is automatically generated.

 Move SplitLogManager splitlog taskstate and AssignmentManager 
 RegionTransitionData znode datas to pb 
 -

 Key: HBASE-5869
 URL: https://issues.apache.org/jira/browse/HBASE-5869
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: 5869v7.txt, 5869v8.txt, firstcut.txt, secondcut.txt, 
 v4.txt, v5.txt, v6.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264978#comment-13264978
 ] 

nkeywal commented on HBASE-5903:


Considering the actual patch, we can just consider TestAssignmentManager as a 
little bit flaky ;-)

 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception

2012-04-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264979#comment-13264979
 ] 

Zhihong Yu commented on HBASE-5883:
---

Why do we need the following code ?
{code}
+} else if (ioex.getMessage().toLowerCase()
+.contains(connection refused)) {
+  ce = new ConnectException(ioex.getMessage());
{code}


 Backup master is going down due to connection refused exception
 ---

 Key: HBASE-5883
 URL: https://issues.apache.org/jira/browse/HBASE-5883
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Gopinathan A
Assignee: Jieshan Bean
 Attachments: HBASE-5883-94.patch


 The active master node network was down for some time (This node contains 
 Master,DN,ZK,RS). Here backup node got 
 notification, and started to became active. Immedietly backup node got 
 aborted with the below exception.
 {noformat}
 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 finished splitting (more than or equal to) 861248320 bytes in 4 log files in 
 [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting]
  in 26374ms
 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.io.IOException: java.net.ConnectException: Connection refused
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
   at $Proxy13.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353)
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660)
   at 
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
   ... 20 more
 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: 
 Stopping service threads
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264988#comment-13264988
 ] 

Zhihong Yu commented on HBASE-5903:
---

Minor comments:
{code}
+/**
+ * Copyright 2012 The Apache Software Foundation
{code}
Year is not needed.
{code}
+ListClass? badClasses = new java.util.ArrayListClass?();
{code}
ArrayList is imported already.
{code}
+  private boolean existCategoryAnnotation(Class? c) {
{code}
Should the above method be named 'hasCategoryAnnotation()' ?


 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5903:
-

Attachment: 5903v4.txt

What  I applied (added class comment and removed copyright year line).

Committed to trunk.  Thanks for the patch Nicolas.

 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch, 5903v4.txt


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264990#comment-13264990
 ] 

stack commented on HBASE-5904:
--

Then we should revert hbase-5155, would you agree David?

IIRC, there was a reason for absence of znode meaning ENABLED but don't 
remember it off hand.

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264991#comment-13264991
 ] 

Hadoop QA commented on HBASE-5903:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525075/5903v4.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1692//console

This message is automatically generated.

 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch, 5903v4.txt


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread David S. Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264992#comment-13264992
 ] 

David S. Wang commented on HBASE-5904:
--

I think at least a partial revert of HBASE-5155 is warranted here.  I don't 
know if we want to back it out entirely as it seems to solve a race condition 
that would be good to not have.  Perhaps most of the patch can remain, but the 
part that handles how a table is represented as enabled in ZK can be reverted 
or worked around.  But Ram can comment further on how best to handle this.

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread David S. Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264994#comment-13264994
 ] 

David S. Wang commented on HBASE-5904:
--

Also, I'm not sure if it matters that 0.90.6 was already cut with this change.  
That means that there is already an incompatible release out there.  I do not 
know what the precedent is here or if there is one.

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264999#comment-13264999
 ] 

ramkrishna.s.vasudevan commented on HBASE-5904:
---

@Stack
Yes, David discussed this with me too.  But i was also not sure as how to go 
about with this. Thanks David for bringing this up.  

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5889) Remove HRegionInterface

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265000#comment-13265000
 ] 

stack commented on HBASE-5889:
--

bq. I'm just not sure this is actually the best bang for the buck, and might 
make layering less clean.

Because the HRegion APIs would all take pbs rather than the Get/Put/Delete, 
etc.?  And doing this conversion would be a bunch of work that would be better 
spent doing other stuff?

Serverside, going from pb into Get/Delete/Put just to get the data into and out 
of regions seems gratuitous and crud we should purge.

Your profiling though would seem to make this a minor issue, one I would have 
thought prviously critical to address.

 Remove HRegionInterface
 ---

 Key: HBASE-5889
 URL: https://issues.apache.org/jira/browse/HBASE-5889
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc, regionserver
Affects Versions: 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0


 As a step to move internals to PB, so as to avoid the conversion for 
 performance reason, we should remove the HRegionInterface. 
 Therefore region server only supports ClientProtocol and AdminProtocol.  
 Later on, HRegion can work with PB messages directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265005#comment-13265005
 ] 

stack commented on HBASE-5904:
--

If 0.90.6 has this breakage, then the damage is done.  We should mark 
hbase-5155 an incompatible change and put in a fat release note w/ how it 
changes behavior (Steal some of David's notes above).  You up for doing this 
Ram?

I'm surprised that the change in semantic where no znode no longer means 
enabled has not caused other issues.

Good digging David.

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265009#comment-13265009
 ] 

ramkrishna.s.vasudevan commented on HBASE-5864:
---

Let me update the resolved versions as 0.96 also. I was just about to prepare a 
patch for trunk.  Thanks Lars for taking care of it.


 Error while reading from hfile in 0.94
 --

 Key: HBASE-5864
 URL: https://issues.apache.org/jira/browse/HBASE-5864
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, 
 HBASE-5864_3.patch, HBASE-5864_test.patch


 Got the following stacktrace during region split.
 {noformat}
 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: 
 Failed getting store size for value
 java.io.IOException: Requested block is out of range: 2906737606134037404, 
 lastDataBlockOffset: 84764558
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638)
   at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943)
   at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5864) Error while reading from hfile in 0.94

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5864:
--

Fix Version/s: 0.96.0

 Error while reading from hfile in 0.94
 --

 Key: HBASE-5864
 URL: https://issues.apache.org/jira/browse/HBASE-5864
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, 
 HBASE-5864_3.patch, HBASE-5864_test.patch


 Got the following stacktrace during region split.
 {noformat}
 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: 
 Failed getting store size for value
 java.io.IOException: Requested block is out of range: 2906737606134037404, 
 lastDataBlockOffset: 84764558
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638)
   at 
 org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943)
   at 
 org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests

2012-04-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5901:
-

Priority: Critical  (was: Major)

+1

Nice.

 Use union type protobufs instead of class/byte pairs for multi requests
 ---

 Key: HBASE-5901
 URL: https://issues.apache.org/jira/browse/HBASE-5901
 Project: HBase
  Issue Type: Improvement
  Components: ipc, performance
Affects Versions: 0.96.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hbase-5901.txt


 The current implementation of multi actions uses repeated NameBytesPairs 
 for the contents of multi actions. Instead, we should introduce a union type 
 protobuf for the valid actions. This makes the RPCs smaller since they don't 
 need to carry class names, and makes deserialization faster since it can 
 avoid some copying and reflection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265022#comment-13265022
 ] 

ramkrishna.s.vasudevan commented on HBASE-5904:
---

Added release note to HBASE-5155.  
@David/@Stack 
Please take a look at it.

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reopened HBASE-5155:
---


Will close this once the release note is reviewed.

 ServerShutDownHandler And Disable/Delete should not happen parallely leading 
 to recreation of regions that were deleted
 ---

 Key: HBASE-5155
 URL: https://issues.apache.org/jira/browse/HBASE-5155
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.90.6

 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
 HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch


 ServerShutDownHandler and disable/delete table handler races.  This is not an 
 issue due to TM.
 - A regionserver goes down.  In our cluster the regionserver holds lot of 
 regions.
 - A region R1 has two daughters D1 and D2.
 - The ServerShutdownHandler gets called and scans the META and gets all the 
 user regions
 - Parallely a table is disabled. (No problem in this step).
 - Delete table is done.
 - The tables and its regions are deleted including R1, D1 and D2.. (So META 
 is cleaned)
 - Now ServerShutdownhandler starts to processTheDeadRegion
 {code}
  if (hri.isOffline()  hri.isSplit()) {
   LOG.debug(Offlined and split region  + hri.getRegionNameAsString() +
 ; checking daughter presence);
   fixupDaughters(result, assignmentManager, catalogTracker);
 {code}
 As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
 {code}
 if (isDaughterMissing(catalogTracker, daughter)) {
   LOG.info(Fixup; missing daughter  + daughter.getRegionNameAsString());
   MetaEditor.addDaughter(catalogTracker, daughter, null);
   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
   // there then something wonky about the split -- things will keep going
   // but could be missing references to parent region.
   // And assign it.
   assignmentManager.assign(daughter, true);
 {code}
 we call assign of the daughers.  
 Now after this we again start with the below code.
 {code}
 if (processDeadRegion(e.getKey(), e.getValue(),
 this.services.getAssignmentManager(),
 this.server.getCatalogTracker())) {
   this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Now when the SSH scanned the META it had R1, D1 and D2.
 So as part of the above code D1 and D2 which where assigned by fixUpDaughters
 is again assigned by 
 {code}
 this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Thus leading to a zookeeper issue due to bad version and killing the master.
 The important part here is the regions that were deleted are recreated which 
 i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5155:
--

Release Note: 
This issue is an incompatible change.
If an HBase client with the changes for HBASE-5155 and a server (master) 
without the changes for HBASE-5155 is used, then the is_enabled (from HBase 
Shell) or isTableEnabled() (from HBaseAdmin) will return false though the table 
is already enabled as per the master.

If the HBase client does not have the changes for HBASE-5155 and the server has 
the changes for HBASE-5155, then if we try to Enable a table then the client 
will hang.

The reason is because,
Prior to HBASE-5155 once the table is enabled the znode in the zookeeper 
created for the table is deleted.
After HBASE-5155 once the table is enabled the znode in the zookeeper created 
for the table is not deleted, whereas the same node is updated with the status 
ENABLED.

The client also expects the status of the znode in the zookeeper to be in the 
ENABLED state if the table has been enabled successfully.
The above changes makes the client behaviour incompatible if the client does 
not have this fix whereas the server has this fix.
If both the client and the server does not have this fix, then the behaviour is 
as expected.

I have added a release note on this issue.  Pls review.
Sorry about the problem introduced.


 ServerShutDownHandler And Disable/Delete should not happen parallely leading 
 to recreation of regions that were deleted
 ---

 Key: HBASE-5155
 URL: https://issues.apache.org/jira/browse/HBASE-5155
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.90.6

 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
 HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch


 ServerShutDownHandler and disable/delete table handler races.  This is not an 
 issue due to TM.
 - A regionserver goes down.  In our cluster the regionserver holds lot of 
 regions.
 - A region R1 has two daughters D1 and D2.
 - The ServerShutdownHandler gets called and scans the META and gets all the 
 user regions
 - Parallely a table is disabled. (No problem in this step).
 - Delete table is done.
 - The tables and its regions are deleted including R1, D1 and D2.. (So META 
 is cleaned)
 - Now ServerShutdownhandler starts to processTheDeadRegion
 {code}
  if (hri.isOffline()  hri.isSplit()) {
   LOG.debug(Offlined and split region  + hri.getRegionNameAsString() +
 ; checking daughter presence);
   fixupDaughters(result, assignmentManager, catalogTracker);
 {code}
 As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
 {code}
 if (isDaughterMissing(catalogTracker, daughter)) {
   LOG.info(Fixup; missing daughter  + daughter.getRegionNameAsString());
   MetaEditor.addDaughter(catalogTracker, daughter, null);
   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
   // there then something wonky about the split -- things will keep going
   // but could be missing references to parent region.
   // And assign it.
   assignmentManager.assign(daughter, true);
 {code}
 we call assign of the daughers.  
 Now after this we again start with the below code.
 {code}
 if (processDeadRegion(e.getKey(), e.getValue(),
 this.services.getAssignmentManager(),
 this.server.getCatalogTracker())) {
   this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Now when the SSH scanned the META it had R1, D1 and D2.
 So as part of the above code D1 and D2 which where assigned by fixUpDaughters
 is again assigned by 
 {code}
 this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Thus leading to a zookeeper issue due to bad version and killing the master.
 The important part here is the regions that were deleted are recreated which 
 i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5903) Detect the test classes without categories

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265023#comment-13265023
 ] 

Hudson commented on HBASE-5903:
---

Integrated in HBase-TRUNK #2826 (See 
[https://builds.apache.org/job/HBase-TRUNK/2826/])
HBASE-5903 Detect the test classes without categories (Revision 1332260)

 Result = SUCCESS
stack : 
Files : 
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestCheckTestClasses.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestHColumnDescriptor.java


 Detect the test classes without categories
 --

 Key: HBASE-5903
 URL: https://issues.apache.org/jira/browse/HBASE-5903
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5903.v3.patch, 5903v4.txt


 The tests are executed by category. When a test does not have a category, 
 it's not run on prebuild nor central build.
 This new test checks the test classess and list the ones without category. It 
 fails if it finds one. As it's a small test it will be executed on the 
 developper machine and will fail immediately on the central builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread David S. Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265024#comment-13265024
 ] 

David S. Wang commented on HBASE-5904:
--

Should we back out HBASE-5155 entirely for now?  I looked at it and just 
backing out the part that changes the znode behavior implies that we should 
also remove isTablePresent(), which seems to affect more of the patch's 
functionality and then it gets messy.

Is there any later change that depends on HBASE-5155?

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage

2012-04-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265028#comment-13265028
 ] 

Lars Hofhansl commented on HBASE-5897:
--

Right. That's what I was trying to say when I attached my patch.




 prePut coprocessor hook causing substantial CPU usage
 -

 Key: HBASE-5897
 URL: https://issues.apache.org/jira/browse/HBASE-5897
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5897-simple.txt, hbase-5897.txt


 I was running an insert workload against trunk under oprofile and saw that a 
 significant portion of CPU usage was going to calling the prePut 
 coprocessor hook inside doMiniBatchPut, even though I don't have any 
 coprocessors installed. I ran a million-row insert and collected CPU time 
 spent in the RS after commenting out the preput hook, and found CPU usage 
 reduced by 33%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb

2012-04-30 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265032#comment-13265032
 ] 

jirapos...@reviews.apache.org commented on HBASE-5869:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4926/#review7379
---


Looks good to me.

- Jimmy


On 2012-04-28 23:42:52, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4926/
bq.  ---
bq.  
bq.  (Updated 2012-04-28 23:42:52)
bq.  
bq.  
bq.  Review request for hbase and Jimmy Xiang.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Convert two zk users to pb: distributed log splitting and regions in 
transition.
bq.  
bq.  Refactored distributed log splitting so we only serialize/deserialize in 
one location.
bq.  Less changes needed to do same for regions in transition.
bq.  
bq.  Moves serialization/deserialization out of the ZKAssign, ZKSplit and into
bq.  the classes themselves so can encapsulate how serialization is done into 
one place
bq.  (try to make the ZK* classes just deal in bytes -- about 90% done).
bq.  
bq.  Moved classes used by various packages up to top level to minimize imports
bq.  that are across package (zookeeper into protobuf and/or into regionserver 
and/or
bq.  master packages, etc).
bq.  
bq.  A src/main/java/org/apache/hadoop/hbase/DeserializationException.java
bq.New generic deserialization exception.
bq.  A src/main/java/org/apache/hadoop/hbase/zookeeper/EmptyWatcher.java
bq.  D  src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java
bq.Moved under zookeeper package.
bq.  A src/main/java/org/apache/hadoop/hbase/HBaseException.java
bq.New base hbase exception as suggested by hbase-5796.  New 
DeserializationException
bq.inherits from this.
bq.  A src/main/java/org/apache/hadoop/hbase/RegionTransition.java
bq.State of a region in transition.  Top-level because used by a
bq.few top-level packages.  Encapsulates pb serialization/deserialization.
bq.  M src/main/java/org/apache/hadoop/hbase/ServerName.java
bq.Add method to deserialize a ServeName, etc.  Encapsulates pb'ing.
bq.  M src/main/java/org/apache/hadoop/hbase/SplitLogCounters.java
bq.Counters used by distributed log splitting.
bq.  A SplitLogTask
bq. Class that encapsulates log splitting state.  Also encapsulates pb'ing.
bq.  M src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
bq.Implement code for state.  Added functions to go from code to state and 
vice
bq.versa.  Used serializing.
bq.  M src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java
bq.Remove unused imports.
bq.  D src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
bq.Removed.  Replaced by RegionTransition moved to package top-level.
bq.  M src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
bq.  M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
bq.Use new DeserializationException. Move to using new RegionTransition
bq.from RegionTransitionData class.  Pass deserialized class rather than
bq.byte array.  Remove duplicated code.
bq.  M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
bq.Use new ServerName parse method rather than ZKUtil one.
bq.  M src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
bq.  M src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
bq.Redo to use new SplitLogTask and SplitLogCounter classes.
bq.  M src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
bq.expectPBMagicPrefix added
bq.  M src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
bq.Use new RegionTransition in place of RegionTransitionData.
bq.  M src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
bq.Define moved from ZKSplitLog to SplitLogManager.
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
bq.Changed method name from getZNodeData to toByteArray to match how we've
bq.named it elsewhere. Use new DeserializationException
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
bq.Use new RegionTransion class
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java
bq.Moved stuff that was in here up into SplitLogManager where better
bq.belongs.  Also moved serialization/deserialization up into the
bq.class itself: SplitLogTask.  Moved counters out to SplitLogCounter class.
bq.  M 

[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb

2012-04-30 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265031#comment-13265031
 ] 

jirapos...@reviews.apache.org commented on HBASE-5869:
--



bq.  On 2012-04-28 22:14:23, Jimmy Xiang wrote:
bq.   src/main/protobuf/ZooKeeper.proto, line 82
bq.   https://reviews.apache.org/r/4926/diff/1/?file=105372#file105372line82
bq.  
bq.   A task is a path, this is more like a task state, isn't it?
bq.  
bq.  Michael Stack wrote:
bq.  I can change this np.
bq.  
bq.  Currently I have the pb class named same as the class that wraps it.  
Should I change this?  Add a pb prefix or something?   Problem w/ that is that 
no other of the pb classes have the pb prefix.  They are in the generated 
package which is probably sufficient to distingush them?  My hope is to make it 
so the pbs do not leak outside of the class that serializes to them; e.g. this 
SplitLogTask class.
bq.  
bq.  Jimmy Xiang wrote:
bq.  I got your point. I prefer to have the pb class named the same as the 
wrapper class, if there is one.  Should we create a separate task state wrapper 
class if needed?
bq.  
bq.  Michael Stack wrote:
bq.  I just tried changing the name of this class from SplitLogTask to 
SplitLogTaskState and it don't seem right since you can do a 'getState' call on 
this class -- the class has State AND the origin of the task.  I'm going to 
leave the name as is.
bq.  
bq.  Ok on keeping names the same.  It should be fine if we can keep the pb 
stuff bottled up under the pb package or internal only to the class that uses 
the pb (except where pb comes out on server..)
bq.  
bq.  Thanks  Jimmy

Ok, that's fine with me.


bq.  On 2012-04-28 22:14:23, Jimmy Xiang wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, 
line 182
bq.   https://reviews.apache.org/r/4926/diff/1/?file=105357#file105357line182
bq.  
bq.   Should we abort? Under what scenario the parsing can fail, other 
than a conflict data format?
bq.  
bq.  Michael Stack wrote:
bq.  I thought I was just redoing what was there previous.  We could abort 
but maybe next time through the deserialization works because its been updated 
by another?  Or, we spew this error all over the logs and drive someone crazy?  
 Will look at it again.
bq.  
bq.  Michael Stack wrote:
bq.  Yeah, I'll leave this as is after looking at it.  Hopefully will be 
good on next go around.

Ok


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4926/#review7360
---


On 2012-04-28 23:42:52, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4926/
bq.  ---
bq.  
bq.  (Updated 2012-04-28 23:42:52)
bq.  
bq.  
bq.  Review request for hbase and Jimmy Xiang.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Convert two zk users to pb: distributed log splitting and regions in 
transition.
bq.  
bq.  Refactored distributed log splitting so we only serialize/deserialize in 
one location.
bq.  Less changes needed to do same for regions in transition.
bq.  
bq.  Moves serialization/deserialization out of the ZKAssign, ZKSplit and into
bq.  the classes themselves so can encapsulate how serialization is done into 
one place
bq.  (try to make the ZK* classes just deal in bytes -- about 90% done).
bq.  
bq.  Moved classes used by various packages up to top level to minimize imports
bq.  that are across package (zookeeper into protobuf and/or into regionserver 
and/or
bq.  master packages, etc).
bq.  
bq.  A src/main/java/org/apache/hadoop/hbase/DeserializationException.java
bq.New generic deserialization exception.
bq.  A src/main/java/org/apache/hadoop/hbase/zookeeper/EmptyWatcher.java
bq.  D  src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java
bq.Moved under zookeeper package.
bq.  A src/main/java/org/apache/hadoop/hbase/HBaseException.java
bq.New base hbase exception as suggested by hbase-5796.  New 
DeserializationException
bq.inherits from this.
bq.  A src/main/java/org/apache/hadoop/hbase/RegionTransition.java
bq.State of a region in transition.  Top-level because used by a
bq.few top-level packages.  Encapsulates pb serialization/deserialization.
bq.  M src/main/java/org/apache/hadoop/hbase/ServerName.java
bq.Add method to deserialize a ServeName, etc.  Encapsulates pb'ing.
bq.  M src/main/java/org/apache/hadoop/hbase/SplitLogCounters.java
bq.Counters used by distributed log splitting.
bq.  A SplitLogTask
bq. Class that encapsulates log splitting state.  Also encapsulates pb'ing.
bq.  M 

[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265047#comment-13265047
 ] 

stack commented on HBASE-5904:
--

I suppose it would make sense backing it out.  We could roll a 0.90.7?

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5905) Protobuf interface for Admin: split between the internal and the external/customer interface

2012-04-30 Thread nkeywal (JIRA)
nkeywal created HBASE-5905:
--

 Summary: Protobuf interface for Admin: split between the internal 
and the external/customer interface
 Key: HBASE-5905
 URL: https://issues.apache.org/jira/browse/HBASE-5905
 Project: HBase
  Issue Type: Improvement
  Components: client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal


After a short discussion with Stack, I create a jira.
--
I'am a little bit confused by the protobuf interface for closeRegion.

We have two types of closeRegion today:
1) the external ones; available in client.HBaseAdmin. They take the server and 
the region identifier as a parameter and nothing else.
2) The internal ones, called for example by the master. They have more 
parameters (like versionOfClosingNode or transitionInZK).

When I look at protobuf.ProtobufUtil, I see:

  public static void closeRegion(final AdminProtocol admin,
  final byte[] regionName, final boolean transitionInZK) throws IOException 
{
CloseRegionRequest closeRegionRequest =
  RequestConverter.buildCloseRegionRequest(regionName, transitionInZK);
try {
  admin.closeRegion(null, closeRegionRequest);
} catch (ServiceException se) {
  throw getRemoteException(se);
}
  }


In other words, it seems that we merged the two interfaces into a single one. 
Is that the intend?
I checked, the internal fields in closeRegionRequest are all optional (that's 
good). Still, it means that the end user could use them or at least would need 
to distinguish between the optional for functional reasons and the optional 
- do not use.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5886) Add new metric for possible data loss due to puts without WAL

2012-04-30 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265055#comment-13265055
 ] 

Nicolas Spiegelberg commented on HBASE-5886:


I'm confused about why this metric is useful.  This metric is never accurate 
and determining data loss because querying it is async from the Put path.  If 
you are looking for a restart point, you should have another thread call 
HTable.flush() and checkpoint or add an API to query for the latest timestamp 
in a CF's storefile. 

 Add new metric for possible data loss due to puts without WAL 
 --

 Key: HBASE-5886
 URL: https://issues.apache.org/jira/browse/HBASE-5886
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: metrics
 Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, 
 HBASE-5886-v2.patch


 Add a metrics to keep track of puts without WAL and possible data loss size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5886) Add new metric for possible data loss due to puts without WAL

2012-04-30 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265055#comment-13265055
 ] 

Nicolas Spiegelberg edited comment on HBASE-5886 at 4/30/12 5:55 PM:
-

I'm confused about why this metric is useful.  This metric is never accurate at 
determining data loss because querying it is async from the Put path.  If you 
are looking for a restart point, you should have another thread call 
HTable.flush() and checkpoint or add an API to query for the latest timestamp 
in a CF's storefile. 

Edit: s/and/at

  was (Author: nspiegelberg):
I'm confused about why this metric is useful.  This metric is never 
accurate and determining data loss because querying it is async from the Put 
path.  If you are looking for a restart point, you should have another thread 
call HTable.flush() and checkpoint or add an API to query for the latest 
timestamp in a CF's storefile. 
  
 Add new metric for possible data loss due to puts without WAL 
 --

 Key: HBASE-5886
 URL: https://issues.apache.org/jira/browse/HBASE-5886
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: metrics
 Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, 
 HBASE-5886-v2.patch


 Add a metrics to keep track of puts without WAL and possible data loss size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5895) Slow query log in trunk is too verbose

2012-04-30 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265060#comment-13265060
 ] 

Nicolas Spiegelberg commented on HBASE-5895:


It should at least be optional to enable verbose logging.  Another thought was 
rate limiting the number of times a region could log a slow query over a given 
time (to rate limit logging disk IO/sec)

 Slow query log in trunk is too verbose
 --

 Key: HBASE-5895
 URL: https://issues.apache.org/jira/browse/HBASE-5895
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Todd Lipcon
Priority: Critical

 Running a YCSB workload against trunk, the slow query log ends up logging the 
 entire contents of mutate RPCs (in PB-encoded binary). This then makes the 
 logging back up, which makes more slow queries, which makes the whole thing 
 spin out of control. We should only summarize the RPC, rather than printing 
 the whole contents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor

2012-04-30 Thread Chris Waterson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265059#comment-13265059
 ] 

Chris Waterson commented on HBASE-3691:
---

What is the likelihood that this could be back-ported to the 0.90.x branch?

 Add compressor support for 'snappy', google's compressor
 

 Key: HBASE-3691
 URL: https://issues.apache.org/jira/browse/HBASE-3691
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: hbase-snappy-3691-trunk-002.patch, 
 hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, 
 hbase-snappy-3691-trunk.patch


 http://code.google.com/p/snappy/ is apache licensed.
 bq. Snappy is a compression/decompression library. It does not aim for 
 maximum compression, or compatibility with any other compression library; 
 instead, it aims for very high speeds and reasonable compression. For 
 instance, compared to the fastest mode of zlib, Snappy is an order of 
 magnitude faster for most inputs, but the resulting compressed files are 
 anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 
 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses 
 at about 500 MB/sec or more.
 bq. Snappy is widely used inside Google, in everything from BigTable and 
 MapReduce to our internal RPC systems. (Snappy has previously been referred 
 to as Zippy in some presentations and the likes.)
 Lets get it in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265063#comment-13265063
 ] 

stack commented on HBASE-5897:
--

+1 on the more radical patch.

 prePut coprocessor hook causing substantial CPU usage
 -

 Key: HBASE-5897
 URL: https://issues.apache.org/jira/browse/HBASE-5897
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5897-simple.txt, hbase-5897.txt


 I was running an insert workload against trunk under oprofile and saw that a 
 significant portion of CPU usage was going to calling the prePut 
 coprocessor hook inside doMiniBatchPut, even though I don't have any 
 coprocessors installed. I ran a million-row insert and collected CPU time 
 spent in the RS after commenting out the preput hook, and found CPU usage 
 reduced by 33%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build

2012-04-30 Thread stack (JIRA)
stack created HBASE-5906:


 Summary: TestChangingEncoding failing sporadically in 0.94 build
 Key: HBASE-5906
 URL: https://issues.apache.org/jira/browse/HBASE-5906
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 5906.txt

The test passes locally for me and Elliott but takes a long time to run.  
Timeout is only two minutes for the test though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build

2012-04-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5906:
-

Attachment: 5906.txt

Patch I'm going to try  Doubles timeout from two minutes to four.

 TestChangingEncoding failing sporadically in 0.94 build
 ---

 Key: HBASE-5906
 URL: https://issues.apache.org/jira/browse/HBASE-5906
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 5906.txt


 The test passes locally for me and Elliott but takes a long time to run.  
 Timeout is only two minutes for the test though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265066#comment-13265066
 ] 

stack commented on HBASE-5906:
--

Applied to 0.94 and trunk.  Lets see if it fails subsequently.

 TestChangingEncoding failing sporadically in 0.94 build
 ---

 Key: HBASE-5906
 URL: https://issues.apache.org/jira/browse/HBASE-5906
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 5906.txt


 The test passes locally for me and Elliott but takes a long time to run.  
 Timeout is only two minutes for the test though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5785) Adding unit tests for protbuf utils introduced for HRegionInterface pb conversion

2012-04-30 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265069#comment-13265069
 ] 

jirapos...@reviews.apache.org commented on HBASE-5785:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4936/
---

Review request for hbase and Michael Stack.


Summary
---

I added some tests for that conversion methods.  For those helper utilities, 
they are tested in other tests implicitly.  We can add more later on if needed.


This addresses bug HBASE-5785.
https://issues.apache.org/jira/browse/HBASE-5785


Diffs
-

  src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 994cb76 
  src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 9b594aa 
  src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/4936/diff


Testing
---

The new tests are green.


Thanks,

Jimmy



 Adding unit tests for protbuf utils introduced for HRegionInterface pb 
 conversion
 -

 Key: HBASE-5785
 URL: https://issues.apache.org/jira/browse/HBASE-5785
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.96.0

 Attachments: hbase-5785.patch


 We need to add some unit tests for the probuf utilities to catch issues 
 earlier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5885) Invalid HFile block magic on Local file System

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265068#comment-13265068
 ] 

stack commented on HBASE-5885:
--

I don't think the TestChangingEncoding is related to this change.  It enables 
verification of checksum in local filesystem.  The TestChangingEncoding doesn't 
even use local filesystem.  I opened HBASE-5906 to look into the 
TestChangingEncoding fails.

 Invalid HFile block magic on Local file System
 --

 Key: HBASE-5885
 URL: https://issues.apache.org/jira/browse/HBASE-5885
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5885-trunk-v2.txt, HBASE-5885-94-0.patch, 
 HBASE-5885-94-1.patch, HBASE-5885-trunk-0.patch, HBASE-5885-trunk-1.patch


 ERROR: java.lang.RuntimeException: 
 org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
 attempts=7, exceptions:
 Thu Apr 26 11:19:18 PDT 2012, 
 org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: 
 java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for 
 reader 
 reader=file:/tmp/hbase-eclark/hbase/TestTable/e2d1c846363c75262cbfd85ea278b342/info/bae2681d63734066957b58fe791a0268,
  compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
 [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
 [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], 
 firstKey=01/info:data/1335463981520/Put, 
 lastKey=0002588100/info:data/1335463902296/Put, avgKeyLen=30, 
 avgValueLen=1000, entries=1215085, length=1264354417, 
 cur=000248/info:data/1335463994457/Put/vlen=1000/ts=0]
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:135)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:368)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3323)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3279)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3296)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2393)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376)
 Caused by: java.io.IOException: Invalid HFile block magic: 
 \xEC\xD5\x9D\xB4\xC2bfo
   at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:153)
   at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:164)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:254)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1779)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1637)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:327)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:555)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:651)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130)
   ... 12 more
 Thu Apr 26 11:19:19 PDT 2012, 
 org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: 
 java.io.IOException: java.lang.IllegalArgumentException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1132)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1121)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2420)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 

[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265070#comment-13265070
 ] 

stack commented on HBASE-3691:
--

@Chris Have you tried the patch on 0.90?  Does it work for you?

 Add compressor support for 'snappy', google's compressor
 

 Key: HBASE-3691
 URL: https://issues.apache.org/jira/browse/HBASE-3691
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: hbase-snappy-3691-trunk-002.patch, 
 hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, 
 hbase-snappy-3691-trunk.patch


 http://code.google.com/p/snappy/ is apache licensed.
 bq. Snappy is a compression/decompression library. It does not aim for 
 maximum compression, or compatibility with any other compression library; 
 instead, it aims for very high speeds and reasonable compression. For 
 instance, compared to the fastest mode of zlib, Snappy is an order of 
 magnitude faster for most inputs, but the resulting compressed files are 
 anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 
 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses 
 at about 500 MB/sec or more.
 bq. Snappy is widely used inside Google, in everything from BigTable and 
 MapReduce to our internal RPC systems. (Snappy has previously been referred 
 to as Zippy in some presentations and the likes.)
 Lets get it in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5886) Add new metric for possible data loss due to puts without WAL

2012-04-30 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265071#comment-13265071
 ] 

Matteo Bertozzi commented on HBASE-5886:


@Nicolas the metric is not meant to be precise but just to give an hint about 
possible data loss.


 Add new metric for possible data loss due to puts without WAL 
 --

 Key: HBASE-5886
 URL: https://issues.apache.org/jira/browse/HBASE-5886
 Project: HBase
  Issue Type: New Feature
  Components: metrics, regionserver
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: metrics
 Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, 
 HBASE-5886-v2.patch


 Add a metrics to keep track of puts without WAL and possible data loss size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5905) Protobuf interface for Admin: split between the internal and the external/customer interface

2012-04-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265072#comment-13265072
 ] 

stack commented on HBASE-5905:
--

Sorry N, should have read closer (was running out the door):

bq. In other words, it seems that we merged the two interfaces into a single 
one. Is that the intend?

Yes

bq. I checked, the internal fields in closeRegionRequest are all optional 
(that's good). Still, it means that the end user could use them or at least 
would need to distinguish between the optional for functional reasons and the 
optional - do not use.

Agree.

I'd say this minor issue though given pb classes do not come out through pur 
admin public api, just the api on servers.

 Protobuf interface for Admin: split between the internal and the 
 external/customer interface
 

 Key: HBASE-5905
 URL: https://issues.apache.org/jira/browse/HBASE-5905
 Project: HBase
  Issue Type: Improvement
  Components: client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal

 After a short discussion with Stack, I create a jira.
 --
 I'am a little bit confused by the protobuf interface for closeRegion.
 We have two types of closeRegion today:
 1) the external ones; available in client.HBaseAdmin. They take the server 
 and the region identifier as a parameter and nothing else.
 2) The internal ones, called for example by the master. They have more 
 parameters (like versionOfClosingNode or transitionInZK).
 When I look at protobuf.ProtobufUtil, I see:
   public static void closeRegion(final AdminProtocol admin,
   final byte[] regionName, final boolean transitionInZK) throws 
 IOException {
 CloseRegionRequest closeRegionRequest =
   RequestConverter.buildCloseRegionRequest(regionName, transitionInZK);
 try {
   admin.closeRegion(null, closeRegionRequest);
 } catch (ServiceException se) {
   throw getRemoteException(se);
 }
   }
 In other words, it seems that we merged the two interfaces into a single one. 
 Is that the intend?
 I checked, the internal fields in closeRegionRequest are all optional (that's 
 good). Still, it means that the end user could use them or at least would 
 need to distinguish between the optional for functional reasons and the 
 optional - do not use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer

2012-04-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265076#comment-13265076
 ] 

Zhihong Yu commented on HBASE-5699:
---

Playing with a prototype of this feature using ycsb (half insert, half upate) 
on a 5-node cluster where usertable has 13 regions on each region server.
Without this feature:
{code}
 10 sec: 99965 operations; 9996.5 current ops/sec; [UPDATE 
AverageLatency(us)=258.68] [INSERT AverageLatency(us)=610.28]
 20 sec: 99965 operations; 0 current ops/sec;
 25 sec: 0 operations; 4.3 current ops/sec; [UPDATE 
AverageLatency(us)=2594303.62] [INSERT AverageLatency(us)=1240495.41]
[OVERALL], RunTime(ms), 25844.0
[OVERALL], Throughput(ops/sec), 3868.9831295465096
[UPDATE], Operations, 49935
[UPDATE], AverageLatency(us), 674.2635626314209
{code}
with this feature:
{code}
 10 sec: 99952 operations; 9994.2 current ops/sec; [UPDATE 
AverageLatency(us)=178.7] [INSERT AverageLatency(us)=584.76]
 20 sec: 0 operations; 3.8 current ops/sec; [UPDATE 
AverageLatency(us)=10.88] [INSERT AverageLatency(us)=679174.27]
 20 sec: 0 operations; 0 current ops/sec;
[OVERALL], RunTime(ms), 20867.0
[OVERALL], Throughput(ops/sec), 4791.776489193463
[UPDATE], Operations, 49992
[UPDATE], AverageLatency(us), 178.6439030244839
{code}

 Run with  1 WAL in HRegionServer
 -

 Key: HBASE-5699
 URL: https://issues.apache.org/jira/browse/HBASE-5699
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: Li Pi



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155

2012-04-30 Thread David S. Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265079#comment-13265079
 ] 

David S. Wang commented on HBASE-5904:
--

I have a patch to back it out and will post it once I have test it more.  The 
patch seems to make things compatible again but I want to make sure it doesn't 
break anything else.  Look for it in a day or two.

 is_enabled from shell returns differently from pre- and post- HBASE-5155
 

 Key: HBASE-5904
 URL: https://issues.apache.org/jira/browse/HBASE-5904
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.6
Reporter: David S. Wang

 If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers with HBASE-5155, then is_enabled for a table always 
 returns false even if the table is considered enabled by the servers from the 
 logs.  If I then do the same thing but with an HBase shell and ZooKeeper with 
 HBASE-5155, then is_enabled returns as expected.
 If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, 
 against HBase servers also without HBASE-5155, then is_enabled works as you'd 
 expect.  But if I then do the same thing but with an HBase shell and 
 ZooKeeper with HBASE-5155, then is_enabled returns false even though the 
 table is considered enabled by the servers from the logs.
 Additionally, if I then try to enable the table from the 
 HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for 
 the ZNode to be updated with ENABLED in the data field, but what actually 
 happens is that the ZNode gets deleted since the servers are running without 
 HBASE-5155.
 I think the culprit is that the indication of how a table is considered 
 enabled inside ZooKeeper has changed with HBASE-5155.  Before HBASE-5155, a 
 table was considered enabled if the ZNode for it did not exist.  After 
 HBASE-5155, a table is considered enabled if the ZNode for it exists and has 
 ENABLED in its data.  I think the current code is incompatible when running 
 clients and servers where one side has HBASE-5155 and the other side does not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5905) Protobuf interface for Admin: split between the internal and the external/customer interface

2012-04-30 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265102#comment-13265102
 ] 

Jimmy Xiang commented on HBASE-5905:


Is there a way to specify a parameter private/internal in pb?  Otherwise, we 
may end up with some private protocol for internal usage.

 Protobuf interface for Admin: split between the internal and the 
 external/customer interface
 

 Key: HBASE-5905
 URL: https://issues.apache.org/jira/browse/HBASE-5905
 Project: HBase
  Issue Type: Improvement
  Components: client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal

 After a short discussion with Stack, I create a jira.
 --
 I'am a little bit confused by the protobuf interface for closeRegion.
 We have two types of closeRegion today:
 1) the external ones; available in client.HBaseAdmin. They take the server 
 and the region identifier as a parameter and nothing else.
 2) The internal ones, called for example by the master. They have more 
 parameters (like versionOfClosingNode or transitionInZK).
 When I look at protobuf.ProtobufUtil, I see:
   public static void closeRegion(final AdminProtocol admin,
   final byte[] regionName, final boolean transitionInZK) throws 
 IOException {
 CloseRegionRequest closeRegionRequest =
   RequestConverter.buildCloseRegionRequest(regionName, transitionInZK);
 try {
   admin.closeRegion(null, closeRegionRequest);
 } catch (ServiceException se) {
   throw getRemoteException(se);
 }
   }
 In other words, it seems that we merged the two interfaces into a single one. 
 Is that the intend?
 I checked, the internal fields in closeRegionRequest are all optional (that's 
 good). Still, it means that the end user could use them or at least would 
 need to distinguish between the optional for functional reasons and the 
 optional - do not use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer

2012-04-30 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265112#comment-13265112
 ] 

Elliott Clark commented on HBASE-5699:
--

Intuitively it seems like the number of WAL's that are used should be related 
to the number of spindles available to hbase.  So maybe this should be either a 
configurable number or something that is derived from the number of mount 
points hdfs is hosted on ?

 Run with  1 WAL in HRegionServer
 -

 Key: HBASE-5699
 URL: https://issues.apache.org/jira/browse/HBASE-5699
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: Li Pi



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-04-30 Thread David S. Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265114#comment-13265114
 ] 

David S. Wang commented on HBASE-5155:
--

Ram,

 If the HBase client does not have the changes for HBASE-5155 and the server 
 has the  changes for HBASE-5155, then if we try to Enable a table then the 
 client will hang. 

Actually, I noticed that the hang happens in the opposite case: when the client 
has the changes for HBASE-5155, and the server does not.

Otherwise the release note looks OK to me.

 ServerShutDownHandler And Disable/Delete should not happen parallely leading 
 to recreation of regions that were deleted
 ---

 Key: HBASE-5155
 URL: https://issues.apache.org/jira/browse/HBASE-5155
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.90.6

 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
 HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch


 ServerShutDownHandler and disable/delete table handler races.  This is not an 
 issue due to TM.
 - A regionserver goes down.  In our cluster the regionserver holds lot of 
 regions.
 - A region R1 has two daughters D1 and D2.
 - The ServerShutdownHandler gets called and scans the META and gets all the 
 user regions
 - Parallely a table is disabled. (No problem in this step).
 - Delete table is done.
 - The tables and its regions are deleted including R1, D1 and D2.. (So META 
 is cleaned)
 - Now ServerShutdownhandler starts to processTheDeadRegion
 {code}
  if (hri.isOffline()  hri.isSplit()) {
   LOG.debug(Offlined and split region  + hri.getRegionNameAsString() +
 ; checking daughter presence);
   fixupDaughters(result, assignmentManager, catalogTracker);
 {code}
 As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
 {code}
 if (isDaughterMissing(catalogTracker, daughter)) {
   LOG.info(Fixup; missing daughter  + daughter.getRegionNameAsString());
   MetaEditor.addDaughter(catalogTracker, daughter, null);
   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
   // there then something wonky about the split -- things will keep going
   // but could be missing references to parent region.
   // And assign it.
   assignmentManager.assign(daughter, true);
 {code}
 we call assign of the daughers.  
 Now after this we again start with the below code.
 {code}
 if (processDeadRegion(e.getKey(), e.getValue(),
 this.services.getAssignmentManager(),
 this.server.getCatalogTracker())) {
   this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Now when the SSH scanned the META it had R1, D1 and D2.
 So as part of the above code D1 and D2 which where assigned by fixUpDaughters
 is again assigned by 
 {code}
 this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Thus leading to a zookeeper issue due to bad version and killing the master.
 The important part here is the regions that were deleted are recreated which 
 i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5611:
--

Attachment: 5611-94-v2.txt

Patch for 0.94 branch which fixes FIXED_OVERHEAD.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265127#comment-13265127
 ] 

Zhihong Yu commented on HBASE-5611:
---

Integrated 5611-94-v2.txt to 0.94 branch.

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer

2012-04-30 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265128#comment-13265128
 ] 

Zhihong Yu commented on HBASE-5699:
---

Currently I use the following knob for the maximum number of WAL's on an 
individual region server:
{code}
+int totalInstances = conf.getInt(hbase.regionserver.hlog.total, 
DEFAULT_MAX_HLOG_INSTANCES);
{code}

 Run with  1 WAL in HRegionServer
 -

 Key: HBASE-5699
 URL: https://issues.apache.org/jira/browse/HBASE-5699
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: Li Pi



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265135#comment-13265135
 ] 

Hudson commented on HBASE-5906:
---

Integrated in HBase-TRUNK #2827 (See 
[https://builds.apache.org/job/HBase-TRUNK/2827/])
HBASE-5906 TestChangingEncoding failing sporadically in 0.94 build 
(Revision 1332320)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestChangingEncoding.java


 TestChangingEncoding failing sporadically in 0.94 build
 ---

 Key: HBASE-5906
 URL: https://issues.apache.org/jira/browse/HBASE-5906
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 5906.txt


 The test passes locally for me and Elliott but takes a long time to run.  
 Timeout is only two minutes for the test though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

2012-04-30 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265162#comment-13265162
 ] 

Nicolas Spiegelberg commented on HBASE-5860:


Also, it looks like there is a race condition in 
CreateAsyncCallback.processResult.  The code is roughly:
{code}
tot_mgr_node_create_result.incrementAndGet();
  if (rc != KeeperException.Code.NODEEXISTS.intValue()) {
if (retry_count  0) {
  tot_mgr_node_create_retry.incrementAndGet();
  createNode(path, retry_count - 1);
}
  }
{code}
So, we should change this to:
{code}
try {
  if (rc != KeeperException.Code.NODEEXISTS.intValue()) {
if (retry_count  0) {
  tot_mgr_node_create_retry.incrementAndGet();
  createNode(path, retry_count - 1);
}
  }
} finally {
  tot_mgr_node_create_result.incrementAndGet();
}
{code}
so we don't mark the znode as responding until we decide if it's a failure and 
we need to reenqueue.  Maybe the repercussions of creating an extra RESCAN node 
aren't worth finding and fixing all these subtle race conditions?

 splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
 ---

 Key: HBASE-5860
 URL: https://issues.apache.org/jira/browse/HBASE-5860
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch


 (Doesn't really impact the run time or correctness of log splitting)
 say the master has lost connection to zk. splitlogmanager's timeoutmanager 
 will realize that all the tasks that were submitted are still unassigned. It 
 will resubmit those tasks (i.e. create dummy znodes)
 splitlogmanager should realze that the tasks are unassigned but their znodes 
 have not been created.
 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 Scheduling batch of logs to split
 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 started splitting logs in 
 [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
 connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, 
 initiating session
 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 total tasks = 4 unassigned = 4
 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 resubmitting unassigned task(s) after timeout
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x36ccb0f8010002, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
 additional data from server sessionid 0x136ccb0f489, likely server has 
 closed socket, closing socket connection and attempting reconnect
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677
  retry=3
 2012-04-20 13:11:21,786 WARN 
 org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc 
 =CONNECTIONLOSS for 
 /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332
  retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly

2012-04-30 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265164#comment-13265164
 ] 

jirapos...@reviews.apache.org commented on HBASE-2214:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4726/#review7383
---


Where are we checking the size of the result made so far?  I don't see it in 
the below.  I'd expect it inside in the RegionScanner.  Any chance of a test?  
Otherwise, patch looks great.


/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
https://reviews.apache.org/r/4726/#comment16293

Is this going to be annoying?  If a high-traffic server, won't this get 
logged once per request?  Perhaps thousands a second?



/src/main/java/org/apache/hadoop/hbase/client/Scan.java
https://reviews.apache.org/r/4726/#comment16294

Is this needed?  Is this set on Scan creation?  When would it change after 
Scan construction?  Or, are we using builder pattern here and its set after 
construction but before use?



/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
https://reviews.apache.org/r/4726/#comment16295

oh, I see how its used now.  ignore above comment.


- Michael


On 2012-04-26 08:18:40, ferdy wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4726/
bq.  ---
bq.  
bq.  (Updated 2012-04-26 08:18:40)
bq.  
bq.  
bq.  Review request for hbase and Ted Yu.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-2214 per scan max buffersize.
bq.  
bq.  
bq.  This addresses bug HBASE-2214.
bq.  https://issues.apache.org/jira/browse/HBASE-2214
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 1330680 
bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1330680 
bq./src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 
1330680 
bq./src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 
1330680 
bq.
/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java 
1330680 
bq.
/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java 
1330680 
bq.
/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java 
1330680 
bq./src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java 
1330680 
bq.
/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 
1330680 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1330680 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1330680 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java 
1330680 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java 
1330680 
bq./src/main/protobuf/Client.proto 1330680 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java
 1330680 
bq.  
bq.  Diff: https://reviews.apache.org/r/4726/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  It works when running this test:
bq.  
bq.  
bq.  new HBaseTestingUtility(conf).startMiniCluster();
bq.   
bq.  HBaseAdmin admin = new HBaseAdmin(conf);
bq.  if (!admin.tableExists(test)) {
bq.HTableDescriptor tableDesc = new HTableDescriptor(test);
bq.tableDesc.addFamily(new HColumnDescriptor(fam));
bq.admin.createTable(tableDesc);
bq.  }
bq.  
bq.  
bq.  HTable table = new HTable(conf, test);
bq.  Put put; 
bq.  
bq.  put = new Put(Bytes.toBytes(row1));
bq.  
put.add(Bytes.toBytes(fam),Bytes.toBytes(qual1),Bytes.toBytes(val1));
bq.  table.put(put);
bq.  
bq.  put = new Put(Bytes.toBytes(row2));
bq.  
put.add(Bytes.toBytes(fam),Bytes.toBytes(qual2),Bytes.toBytes(val2));
bq.  table.put(put);
bq.  
bq.  put = new Put(Bytes.toBytes(row3));
bq.  
put.add(Bytes.toBytes(fam),Bytes.toBytes(qual3),Bytes.toBytes(val3));
bq.  table.put(put);
bq.  
bq.  table.flushCommits();
bq.  {
bq.System.out.println(returns all rows at once because of the 
caching);
bq.Scan scan = new Scan();
bq.scan.setCaching(100);
bq.ResultScanner scanner = table.getScanner(scan);
bq.scanner.next(100);
bq.  }
bq.  {
bq.System.out.println(returns one row at a time because of the 
maxResultSize);
bq.Scan scan = new Scan();
bq.scan.setCaching(100);
bq.scan.setMaxResultSize(1);
bq.ResultScanner scanner = table.getScanner(scan);
bq.scanner.next(100);
bq.  }
bq.  
bq.  
bq.  See 

[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer

2012-04-30 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265167#comment-13265167
 ] 

Jean-Daniel Cryans commented on HBASE-5699:
---

bq. Intuitively it seems like the number of WAL's that are used should be 
related to the number of spindles available to hbase.

I disagree, considering that most of the deployments have rep=3 you're using 
three spindles not one. The multiplying effect could generate a lot of disk 
seeks since the WALs are competing like that (plus flushing, compacting, etc).

 Run with  1 WAL in HRegionServer
 -

 Key: HBASE-5699
 URL: https://issues.apache.org/jira/browse/HBASE-5699
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: Li Pi



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-5611.
--

Resolution: Fixed

 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5385) Delete table/column should delete stored permissions on -acl- table

2012-04-30 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5385:
---

Attachment: HBASE-5385-v1.patch

Perform a Scan with QualifierFilter to remove a column from the _acl_ table.

 Delete table/column should delete stored permissions on -acl- table  
 -

 Key: HBASE-5385
 URL: https://issues.apache.org/jira/browse/HBASE-5385
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Matteo Bertozzi
 Attachments: HBASE-5385-v0.patch, HBASE-5385-v1.patch


 Deleting the table or a column does not cascade to the stored permissions at 
 the -acl- table. We should also remove those permissions, otherwise, it can 
 be a security leak, where freshly created tables contain permissions from 
 previous same-named tables. We might also want to ensure, upon table 
 creation, that no entries are already stored at the -acl- table. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5385) Delete table/column should delete stored permissions on -acl- table

2012-04-30 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5385:
---

Status: Patch Available  (was: Open)

 Delete table/column should delete stored permissions on -acl- table  
 -

 Key: HBASE-5385
 URL: https://issues.apache.org/jira/browse/HBASE-5385
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Matteo Bertozzi
 Attachments: HBASE-5385-v0.patch, HBASE-5385-v1.patch


 Deleting the table or a column does not cascade to the stored permissions at 
 the -acl- table. We should also remove those permissions, otherwise, it can 
 be a security leak, where freshly created tables contain permissions from 
 previous same-named tables. We might also want to ensure, upon table 
 creation, that no entries are already stored at the -acl- table. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage

2012-04-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265197#comment-13265197
 ] 

Lars Hofhansl commented on HBASE-5897:
--

Looked over Todd's patch. The only difference is that before the prePut's edits 
ended up in WALEdit before the family edits. Now that is reversed. Not sure if 
that even makes a difference. +1 otherwise

 prePut coprocessor hook causing substantial CPU usage
 -

 Key: HBASE-5897
 URL: https://issues.apache.org/jira/browse/HBASE-5897
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5897-simple.txt, hbase-5897.txt


 I was running an insert workload against trunk under oprofile and saw that a 
 significant portion of CPU usage was going to calling the prePut 
 coprocessor hook inside doMiniBatchPut, even though I don't have any 
 coprocessors installed. I ran a million-row insert and collected CPU time 
 spent in the RS after commenting out the preput hook, and found CPU usage 
 reduced by 33%.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5385) Delete table/column should delete stored permissions on -acl- table

2012-04-30 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265212#comment-13265212
 ] 

Enis Soztutar commented on HBASE-5385:
--

Looks good. Can we add:
1. Audit logging AccessController.AUDITLOG
2. On preCreateTable and preAddColumn, ensure that the acl table is empty for 
the table / column. We might still have residual acl entries if smt goes wrong. 
If so, we should refuse creating a table by throwing a kind of access control 
exception. 

Andrew, any comments? 

 Delete table/column should delete stored permissions on -acl- table  
 -

 Key: HBASE-5385
 URL: https://issues.apache.org/jira/browse/HBASE-5385
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Matteo Bertozzi
 Attachments: HBASE-5385-v0.patch, HBASE-5385-v1.patch


 Deleting the table or a column does not cascade to the stored permissions at 
 the -acl- table. We should also remove those permissions, otherwise, it can 
 be a security leak, where freshly created tables contain permissions from 
 previous same-named tables. We might also want to ensure, upon table 
 creation, that no entries are already stored at the -acl- table. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb

2012-04-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5869:
-

Status: Patch Available  (was: Open)

 Move SplitLogManager splitlog taskstate and AssignmentManager 
 RegionTransitionData znode datas to pb 
 -

 Key: HBASE-5869
 URL: https://issues.apache.org/jira/browse/HBASE-5869
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, 
 secondcut.txt, v4.txt, v5.txt, v6.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb

2012-04-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5869:
-

Attachment: 5869v9.txt

I was returning early in AssignmentManager if null data inside isCarryingRegion 
when I should have carried on to trip over the get of region location from the 
AM memory.  Seems to fix some of the failing tests.

 Move SplitLogManager splitlog taskstate and AssignmentManager 
 RegionTransitionData znode datas to pb 
 -

 Key: HBASE-5869
 URL: https://issues.apache.org/jira/browse/HBASE-5869
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, 
 secondcut.txt, v4.txt, v5.txt, v6.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb

2012-04-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5869:
-

Status: Open  (was: Patch Available)

 Move SplitLogManager splitlog taskstate and AssignmentManager 
 RegionTransitionData znode datas to pb 
 -

 Key: HBASE-5869
 URL: https://issues.apache.org/jira/browse/HBASE-5869
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, 
 secondcut.txt, v4.txt, v5.txt, v6.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265384#comment-13265384
 ] 

Hudson commented on HBASE-5906:
---

Integrated in HBase-0.94 #161 (See 
[https://builds.apache.org/job/HBase-0.94/161/])
HBASE-5906 TestChangingEncoding failing sporadically in 0.94 build 
(Revision 1332319)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/encoding/TestChangingEncoding.java


 TestChangingEncoding failing sporadically in 0.94 build
 ---

 Key: HBASE-5906
 URL: https://issues.apache.org/jira/browse/HBASE-5906
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 5906.txt


 The test passes locally for me and Elliott but takes a long time to run.  
 Timeout is only two minutes for the test though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

2012-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265383#comment-13265383
 ] 

Hudson commented on HBASE-5611:
---

Integrated in HBase-0.94 #161 (See 
[https://builds.apache.org/job/HBase-0.94/161/])
HBASE-5611 Replayed edits from regions that failed to open during recovery 
aren't removed from the global MemStore size - v2 (Jieshan) (Revision 1332344)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


 Replayed edits from regions that failed to open during recovery aren't 
 removed from the global MemStore size
 

 Key: HBASE-5611
 URL: https://issues.apache.org/jira/browse/HBASE-5611
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, 
 HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch


 This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
 it's still possible to hit it if a region fails to open for more obscure 
 reasons like HDFS errors.
 Consider a region that just went through distributed splitting and that's now 
 being opened by a new RS. The first thing it does is to read the recovery 
 files and put the edits in the {{MemStores}}. If this process takes a long 
 time, the master will move that region away. At that point the edits are 
 still accounted for in the global {{MemStore}} size but they are dropped when 
 the {{HRegion}} gets cleaned up. It's completely invisible until the 
 {{MemStoreFlusher}} needs to force flush a region and that none of them have 
 edits:
 {noformat}
 2012-03-21 00:33:39,303 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=5.9g
 2012-03-21 00:33:39,303 ERROR 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
 for entry null
 java.lang.IllegalStateException
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:129)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 The {{null}} here is a region. In my case I had so many edits in the 
 {{MemStore}} during recovery that I'm over the low barrier although in fact 
 I'm at 0. It happened yesterday and it still printing this out.
 To fix this we need to be able to decrease the global {{MemStore}} size when 
 the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5890) SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS

2012-04-30 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5890:
-

Fix Version/s: (was: 0.94.0)
   0.94.1

Moving out for now.

 SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS
 

 Key: HBASE-5890
 URL: https://issues.apache.org/jira/browse/HBASE-5890
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.96.0, 0.89-fb, 0.94.1

 Attachments: HBASE-5890.patch


 We ran into a production issue yesterday where the SplitLogManager tried to 
 create a Rescan node in ZK.  The createAsync() generated a 
 KeeperException.CONNECTIONLOSS that was immedately sent to processResult(), 
 createRescan node with --retry_count was called, and this created a CPU 
 busywait that also clogged up the logs.  We should handle this better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5888) Clover profile in build

2012-04-30 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-5888:
-

Attachment: HBASE-5358_v2.patch

Updated the patch to ignore generated packages (thrift.generated, 
protobuf.generated), since they are skewing coverage results. 

I uploaded a sample report for 0.92 here:
http://people.apache.org/~enis/hbase-clover/

 Clover profile in build
 ---

 Key: HBASE-5888
 URL: https://issues.apache.org/jira/browse/HBASE-5888
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.92.2, 0.96.0, 0.94.1
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Attachments: HBASE-5358_v2.patch, hbase-clover_v1.patch


 Clover is disabled right now. I would like to add a profile that enables 
 clover reports. We can also backport this to 0.92, and 0.94, since we are 
 also interested in test coverage for those branches. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >