[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173681#comment-13173681
 ] 

Hadoop QA commented on HBASE-5078:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508136/5078.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 75 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/556//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/556//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/556//console

This message is automatically generated.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When 

[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173685#comment-13173685
 ] 

Hadoop QA commented on HBASE-5077:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508163/HBASE-5077.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/558//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/558//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/558//console

This message is automatically generated.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4916) LoadTest MR Job

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173687#comment-13173687
 ] 

Phabricator commented on HBASE-4916:


mbautin has commented on the revision HBASE-4916 [jira] LoadTest MR Job.
Added CCs: jdcryans, stack, tedyu, todd

  @stack, @tedyu, @todd, @jdcryans: this is a versatile map-reduce-based load 
tester that our intern Christopher Gist wrote. While there might be some 
overlap with PerformanceEvaluation and LoadTestTool, this load tester was 
developed and debugged independently and has already been useful to us in 
real-world tasks. Could you please take a quick look and let us know what you 
think is the best strategy of integrating this into the trunk?

REVISION DETAIL
  https://reviews.facebook.net/D741


 LoadTest MR Job
 ---

 Key: HBASE-4916
 URL: https://issues.apache.org/jira/browse/HBASE-4916
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Christopher Gist
 Fix For: 0.94.0

 Attachments: HBASE-4916.D741.1.patch


 Add a script to start a streaming map-reduce job where each map tasks runs an 
 instance of the load tester for a partition of the key-space. Ensure that the 
 load tester takes a parameter indicating the start key for write operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5077:
--

Status: Open  (was: Patch Available)

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077-v2.patch, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5077:
--

Attachment: HBASE-5077-v2.patch

Forgot about the finally block, changing it again to basically add the correct 
return and print if process_failed is false.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.1

 Attachments: HBASE-5077-v2.patch, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173689#comment-13173689
 ] 

Hadoop QA commented on HBASE-5078:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508165/5078-v3.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 75 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/559//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/559//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/559//console

This message is automatically generated.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd 

[jira] [Updated] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5077:
--

Fix Version/s: (was: 0.92.1)
   0.92.0
   Status: Patch Available  (was: Open)

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173691#comment-13173691
 ] 

Hudson commented on HBASE-5021:
---

Integrated in HBase-TRUNK #2564 (See 
[https://builds.apache.org/job/HBase-TRUNK/2564/])
[jira] [HBase-5021] Enforce upper bound on timestamp

Summary:
We have been getting hit with performance problems on the ODS
side due to invalid timestamps being inserted by the timestamp.  ODS is
working on adding proper checks to app server, but production
performance could be severely impacted with significant recovery time if
something slips past.  Therefore, we should also allow the option to
check the upper bound in HBase.

This is the first draft.  Probably should allow per-CF customization.

Test Plan:  - mvn test -Dtest=TestHRegion#testPutWithTsTooNew

Reviewers: Kannan, Liyin, JIRA

CC: stack, nspiegelberg, tedyu, Kannan, mbautin

Differential Revision: 849


 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

2011-12-20 Thread Shrijeet Paliwal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173697#comment-13173697
 ] 

Shrijeet Paliwal commented on HBASE-5035:
-

Ted, you had mentioned following in the email thread: 

Null check for regionInfo should be added 

I could not gather why regionInfo could possibly be null. The call 
'Writables.getHRegionInfo(value);' does not seem to return null ever. Could you 
please tell me your reasoning. 

Meanwhile I am still reading code and trying to find the place where NPE might 
occur.  

 Runtime exceptions during meta scan
 ---

 Key: HBASE-5035
 URL: https://issues.apache.org/jira/browse/HBASE-5035
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal

 Version: 0.90.3 + patches back ported 
 The other day our client started spitting these two runtime exceptions. Not 
 all clients connected to the cluster were under impact. Only 4 of them. While 
 3 of them were throwing NPE, one of them was throwing 
 ArrayIndexOutOfBoundsException. The errors are : 
 1. http://pastie.org/2987926
 2. http://pastie.org/2987927
 Clients did not recover from this and I had to restart them. 
 Motive of this jira is to identify and put null checks at appropriate places. 
 Also with the given stack trace I can not tell which line caused NPE of 
 AIOBE, hence additional motive is to make the trace more helpful. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Zhihong Yu (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reopened HBASE-5021:
---


TestHeapSize fails on Jenkins

 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173706#comment-13173706
 ] 

Phabricator commented on HBASE-5021:


mbautin has commented on the revision [jira] [HBase-5021] Enforce upper bound 
on timestamp.

  This adds a new field to HRegion but does not update heap size 
(FIXED_OVERHEAD). I think the unit test still passes because of alignment, but 
I noticed this problem with the data block encoding patch applied. Let's see if 
TestHeapSize fails on Jenkins.

REVISION DETAIL
  https://reviews.facebook.net/D849

COMMIT
  https://reviews.facebook.net/rHBASE1221532


 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173703#comment-13173703
 ] 

Zhihong Yu commented on HBASE-5078:
---

+1 on patch v3.

Minor comment:
openedNewFile sounds like a boolean. Would numNewlyOpenedFiles be a better name 
?

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173705#comment-13173705
 ] 

Hadoop QA commented on HBASE-5077:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508167/HBASE-5077-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/560//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/560//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/560//console

This message is automatically generated.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173709#comment-13173709
 ] 

Zhihong Yu commented on HBASE-5021:
---

TestHeapSize of HBase-TRUNK #2564 on Jenkins failed.

 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: D849.1.patch, D849.2.patch, D849.3.patch, 
 HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173716#comment-13173716
 ] 

Zhihong Yu commented on HBASE-5035:
---

I checked the code again and think regionInfo is unlikely to be null.
Remaining possibilities are regionInfo.getTableDesc() and somewhere in the HSA 
ctor:
{code}
  public HServerAddress(String hostAndPort) {
{code}

Making the trace more helpful should be the first action.

 Runtime exceptions during meta scan
 ---

 Key: HBASE-5035
 URL: https://issues.apache.org/jira/browse/HBASE-5035
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal

 Version: 0.90.3 + patches back ported 
 The other day our client started spitting these two runtime exceptions. Not 
 all clients connected to the cluster were under impact. Only 4 of them. While 
 3 of them were throwing NPE, one of them was throwing 
 ArrayIndexOutOfBoundsException. The errors are : 
 1. http://pastie.org/2987926
 2. http://pastie.org/2987927
 Clients did not recover from this and I had to restart them. 
 Motive of this jira is to identify and put null checks at appropriate places. 
 Also with the given stack trace I can not tell which line caused NPE of 
 AIOBE, hence additional motive is to make the trace more helpful. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-20 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173730#comment-13173730
 ] 

ramkrishna.s.vasudevan commented on HBASE-5009:
---

@Ted
{code}
  boolean stillRunning = !threadPool.awaitTermination(
  this.fileSplitTimeout, TimeUnit.MILLISECONDS);
{code}
We already do that.  We wait for 30 secs.  But the problem is the threads that 
were spawned are still alive and later they issue the create command to NN.  
Thus after we rollback and delete the splitdir it is again created by these 
threads.  
Correct me if am wrong Ted.

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5073) Registered listeners not getting removed leading to memory leak in HBaseAdmin

2011-12-20 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173734#comment-13173734
 ] 

ramkrishna.s.vasudevan commented on HBASE-5073:
---

Tests are passing.

 Registered listeners not getting removed leading to memory leak in HBaseAdmin
 -

 Key: HBASE-5073
 URL: https://issues.apache.org/jira/browse/HBASE-5073
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-5073.patch


 HBaseAdmin apis like tableExists(), flush, split, closeRegion uses catalog 
 tracker.  Every time Root node tracker and meta node tracker are started and 
 a listener is registered.  But after the operations are performed the 
 listeners are not getting removed. Hence if the admin apis are consistently 
 used then it may lead to memory leak.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173746#comment-13173746
 ] 

Phabricator commented on HBASE-5033:


Liyin has commented on the revision [jira][HBASE-5033][[89-fb]]Opening/Closing 
store in parallel to reduce region open/close time.

  Just discussed with Kannan offline, he found out that it wouldn't achieve the 
maximum parallelization across the regions from different tables, considering 
each table may has different number of stores.

  One proposal is to just bound the total number of threads for this 
parallelization process for each region.
  And the following formula stands:
  the max number of threads for opening stores * the max number of threads for 
opening stores files = the total number of threads we have configured.

  So If one region has less stores, this region will have more threads to run 
to open/close the store files in parallel and vice versa.


REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173764#comment-13173764
 ] 

Hadoop QA commented on HBASE-4218:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508181/D447.9.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 65 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/561//console

This message is automatically generated.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, 
 D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

2011-12-20 Thread Shrijeet Paliwal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173780#comment-13173780
 ] 

Shrijeet Paliwal commented on HBASE-5035:
-

Amm you might be right. 

{noformat}
final String serverAddress = Bytes.toString(value);

// instantiate the location
HRegionLocation loc = new HRegionLocation(regionInfo,
new HServerAddress(serverAddress));
{noformat}

The Bytes.toString call, in theory, may return both an empty string or a null 
string.
In the case when it returns a null (see below), it tries to log an error which 
I didn't see in my log file. 
So I am not still 100% sure this is out guy. 
{noformat}
 try {
  return new String(b, off, len, HConstants.UTF8_ENCODING);
} catch (UnsupportedEncodingException e) {
  LOG.error(UTF-8 not supported?, e);
  return null;
}
{noformat}

Nonetheless it will be good to put a check against serverAddress variable for 
emptiness as well nullness since HServerAddress construtor may throw runtime 
error otherwise. Interesting point is - it can throw both 
ArrayIndexOutOfBoundsException and NPE and I saw both cases.

{noformat}
/**
   * @param hostAndPort Hostname and port formatted as codelt;hostname ':' 
lt;port/code
   */
  public HServerAddress(String hostAndPort) {
int colonIndex = hostAndPort.lastIndexOf(':');
{noformat}


I will open a subtask to make the trace more helpful. 

 Runtime exceptions during meta scan
 ---

 Key: HBASE-5035
 URL: https://issues.apache.org/jira/browse/HBASE-5035
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal

 Version: 0.90.3 + patches back ported 
 The other day our client started spitting these two runtime exceptions. Not 
 all clients connected to the cluster were under impact. Only 4 of them. While 
 3 of them were throwing NPE, one of them was throwing 
 ArrayIndexOutOfBoundsException. The errors are : 
 1. http://pastie.org/2987926
 2. http://pastie.org/2987927
 Clients did not recover from this and I had to restart them. 
 Motive of this jira is to identify and put null checks at appropriate places. 
 Also with the given stack trace I can not tell which line caused NPE of 
 AIOBE, hence additional motive is to make the trace more helpful. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5080) Make HConnectionManager's prefetchRegionCache error trace more helpful

2011-12-20 Thread Shrijeet Paliwal (Created) (JIRA)
Make HConnectionManager's prefetchRegionCache error trace more helpful
--

 Key: HBASE-5080
 URL: https://issues.apache.org/jira/browse/HBASE-5080
 Project: HBase
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal
Priority: Minor


We catch RuntimeException in this HConnectionManager's  prefetchRegionCache 
method. The trace is not very helpful since it eats the information about the 
line where RuntimeException actually happened. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4120) isolation and allocation

2011-12-20 Thread Liu Jia (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Jia updated HBASE-4120:
---

Attachment: TablePriority_v17.patch

Reduce the time used by TestPriorityJobQueue.java.

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, 
 Simple_YCSB_Tests_For_TablePriority_Trunk_and_0.90.4.pdf, System 
 Structure.jpg, TablePriority.patch, TablePriority_v12.patch, 
 TablePriority_v12.patch, TablePriority_v15_with_coprocessor.patch, 
 TablePriority_v16_with_coprocessor.patch, TablePriority_v17.patch, 
 TablePriority_v17.patch, TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173785#comment-13173785
 ] 

jirapos...@reviews.apache.org commented on HBASE-4120:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1421/
---

(Updated 2011-12-21 02:32:29.569594)


Review request for hbase.


Changes
---

Reduce the time used by TestPriorityJobQueue.java 


Summary
---

Patch used for table priority alone,In this patch, not only tables can have 
different priorities but also the different actions like get,scan,put and 
delete can have priorities.


This addresses bug HBase-4120.
https://issues.apache.org/jira/browse/HBase-4120


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
 1220359 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
 1220359 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityHBaseServer.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/QosRegionObserver.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
 1220359 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
 1220359 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/GroupTestUtil.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestActionPriority.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestPriorityJobQueue.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestTablePriority.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestTablePriorityHundredRegion.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestTablePriorityLargeRow.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/1421/diff


Testing
---

Tested with test cases in  TestCase_For_TablePriority_trunk_v1.patch 
please apply the patch of HBASE-4181 first,in some circumstances this bug will 
affect the performance of client.


Thanks,

Jia



 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, 
 Simple_YCSB_Tests_For_TablePriority_Trunk_and_0.90.4.pdf, System 
 Structure.jpg, TablePriority.patch, TablePriority_v12.patch, 
 TablePriority_v12.patch, TablePriority_v15_with_coprocessor.patch, 
 TablePriority_v16_with_coprocessor.patch, TablePriority_v17.patch, 
 TablePriority_v17.patch, TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure 

[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-20 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4218:
---

Attachment: D447.10.patch

mbautin updated the revision [jira] [HBASE-4218] HFile data block encoding 
(delta encoding).
Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan

  Fixing fully-qualified class name in admin.rb. All unit tests passed, except 
TestReplication.queueFailover, which is known to be flaky.

REVISION DETAIL
  https://reviews.facebook.net/D447

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/BitsetKeyDeltaEncoder.java
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
  
src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncodingAlgorithms.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
  
src/main/java/org/apache/hadoop/hbase/io/encoding/EncoderBufferTooSmallException.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
  src/main/ruby/hbase/admin.rb
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/RedundantKVGenerator.java
  
src/test/java/org/apache/hadoop/hbase/io/encoding/TestBufferedDataBlockEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportExport.java
  src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java
  src/test/java/org/apache/hadoop/hbase/regionserver/DataBlockEncodingTool.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/EncodedSeekPerformanceTest.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
  

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173789#comment-13173789
 ] 

Hadoop QA commented on HBASE-4120:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508188/TablePriority_v17.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 24 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -137 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 86 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHeapSize

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/562//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/562//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/562//console

This message is automatically generated.

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, 
 Simple_YCSB_Tests_For_TablePriority_Trunk_and_0.90.4.pdf, System 
 Structure.jpg, TablePriority.patch, TablePriority_v12.patch, 
 TablePriority_v12.patch, TablePriority_v15_with_coprocessor.patch, 
 TablePriority_v16_with_coprocessor.patch, TablePriority_v17.patch, 
 TablePriority_v17.patch, TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173792#comment-13173792
 ] 

Hadoop QA commented on HBASE-4218:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508190/D447.10.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 65 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/563//console

This message is automatically generated.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 D447.1.patch, D447.10.patch, D447.2.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-20 Thread Jimmy Xiang (Created) (JIRA)
Distributed log splitting deleteNode races againsth splitLog retry 
---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Recently, during 0.92 rc testing, we found distributed log splitting hangs 
there forever.  Please see attached screen shot.
I looked into it and here is what happened I think:

1. One rs died, the servershutdownhandler found it out and started the 
distributed log splitting;
2. All three tasks failed, so the three tasks were deleted, asynchronously;
3. Servershutdownhandler retried the log splitting;
4. During the retrial, it created these three tasks again, and put them in a 
hashmap (tasks);
5. The asynchronously deletion in step 2 finally happened for one task, in the 
callback, it removed one
task in the hashmap;
6. One of the newly submitted tasks' zookeeper watcher found out that task is 
unassigned, and it is not
in the hashmap, so it created a new orphan task.
7.  All three tasks failed, but that task created in step 6 is an orphan so the 
batch.err counter was one short,
so the log splitting hangs there and keeps waiting for the last task to finish 
which is never going to happen.

So I think the problem is step 2.  The fix is to make deletion sync, instead of 
async, so that the retry will have
a clean start.

Async deleteNode will mess up with split log retrial.  In extreme situation, if 
async deleteNode doesn't happen
soon enough, some node created during the retrial could be deleted.

deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5021:
-

Attachment: 5021-addendum.txt

Small addendum to fix the broke TestHeapSize

 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5021-addendum.txt, D849.1.patch, D849.2.patch, 
 D849.3.patch, HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173828#comment-13173828
 ] 

stack commented on HBASE-5021:
--

Committed to TRUNK

 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5021-addendum.txt, D849.1.patch, D849.2.patch, 
 D849.3.patch, HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-20 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Attachment: distributed-log-splitting-screenshot.png

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173832#comment-13173832
 ] 

stack commented on HBASE-5077:
--

+1 on patch for branch and trunk.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173836#comment-13173836
 ] 

Hadoop QA commented on HBASE-5078:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508195/5078-v4.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/564//console

This message is automatically generated.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

Attachment: 5078-v4.txt

Changed variable name as per Ted suggestion.  This is what I'm committing.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed branch and trunk.  Thanks for review lads.

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5077:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed branch and trunk.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5077:
-

Attachment: HBASE-5077-v4.txt

This patch actually compiles.

 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077-v4.txt, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173850#comment-13173850
 ] 

stack commented on HBASE-5081:
--

Or we need a means of uniquely identifying the failed split in the hashmap so 
when the callback runs, it only removes the pertinent tasks if present; i.e. 
the filename alone is not enough?

Otherwise, sounds good Jimmy.  Good find.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173857#comment-13173857
 ] 

stack commented on HBASE-4720:
--

@Mubarak That test fails for me too on a mac (I presume you are on a mac -- 
smile).  Need to dig in.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173861#comment-13173861
 ] 

stack commented on HBASE-4720:
--

On patch:

+ Suggest you not do stuff like below in future because it bloats your patch 
making it more susceptible to rot and besides is not related directly to what 
you are trying to your fix so distracting for reviewers (for the next time):

{code}
-  private static final HBaseRESTTestingUtility REST_TEST_UTIL = 
-new HBaseRESTTestingUtility();
+  private static final HBaseRESTTestingUtility REST_TEST_UTIL = new 
HBaseRESTTestingUtility();
{code}

+ Is this safe?  e.g. what if row has binary characters in it?  Should these be 
base64'd or something?

{code}
+path.append('/');
+path.append(checkandput);
+path.append('/');
+path.append(table);
+path.append('/');
+path.append(row);
+path.append('/');
+path.append(column);
{code}

I suppose its not needed in a test (I missed that this is test code)

You have some lines that are way too long.

Else patch looks good to me on cursory review.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173865#comment-13173865
 ] 

Zhihong Yu commented on HBASE-4218:
---

Thanks for the nice work, Mikhail.
{code}
1 out of 1 hunk ignored -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java.rej
1 out of 2 hunks FAILED -- saving rejects to file 
src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java.rej
{code}
Please fix the above conflicts by rebasing against TRUNK.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 D447.1.patch, D447.10.patch, D447.2.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-20 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173867#comment-13173867
 ] 

Zhihong Yu commented on HBASE-5081:
---

To expedite the release of 0.92, I think we can make deleteNode synchronous 
first.

Once distributed log splitting works robustly, we can implement async 
deleteNode in a follow on JIRA.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173872#comment-13173872
 ] 

Lars Hofhansl commented on HBASE-5078:
--

Came here to +1... Too late :)
Interesting that opening 35 files takes  25s.


 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4916) LoadTest MR Job

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173874#comment-13173874
 ] 

Phabricator commented on HBASE-4916:


stack has commented on the revision HBASE-4916 [jira] LoadTest MR Job.

  I started to review code then gave up to look more at what is attached.

  This looks better than PE and more easily extended IMO.  Good documentation 
(could do w/ a package-info with overview of what it is and simple howto run 
it).  Nice it adds itself to Driver.  I think we should commit it after some 
fixup/review.  I think it should go into tools package rather than into 
loadtest package so it'd be tools.loadtest.

  Nice one.





INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/loadtest/CompositeOperationGenerator.java:1
 Missing license.
  
src/main/java/org/apache/hadoop/hbase/loadtest/CompositeOperationGenerator.java:11
 This looks promising
  
src/main/java/org/apache/hadoop/hbase/loadtest/CompositeOperationGenerator.java:78
 isEmpty is cheaper than size() == 0
  
src/main/java/org/apache/hadoop/hbase/loadtest/CompositeOperationGenerator.java:90
 These exceptions could come out here?
  src/main/java/org/apache/hadoop/hbase/loadtest/DataGenerator.java:36 Whats a 
column?  A cf+qualifier written as cf:qualifier?  Or is this just a cf?
  src/main/java/org/apache/hadoop/hbase/loadtest/DataGenerator.java:71 Only 
longs allowed as keys?  Is key == row?
  src/main/java/org/apache/hadoop/hbase/loadtest/DataGenerator.java:75 This 
should be called constructPut rather than constructBulkPut
  src/main/java/org/apache/hadoop/hbase/loadtest/DataGenerator.java:90 long key 
'as String'...
  src/main/java/org/apache/hadoop/hbase/loadtest/GetGenerator.java:109 Should 
be logged at least rather than printStackTrace'd?   Should we let it out?
  src/main/java/org/apache/hadoop/hbase/loadtest/GetOperation.java:47 Logged 
rather than printed
  src/main/java/org/apache/hadoop/hbase/loadtest/KeyCounter.java:32 There is an 
upper bound?  Don't we need an upper bound so mappers don't overlap?  Or is 
that not wanted?
  src/main/java/org/apache/hadoop/hbase/loadtest/KeyCounter.java:72 Why assign 
here?  Why not declare and allocate in the one go?  Could make the data members 
final then.
  src/main/java/org/apache/hadoop/hbase/loadtest/KeyCounter.java:78 Should be 
seeded?

REVISION DETAIL
  https://reviews.facebook.net/D741


 LoadTest MR Job
 ---

 Key: HBASE-4916
 URL: https://issues.apache.org/jira/browse/HBASE-4916
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Christopher Gist
 Fix For: 0.94.0

 Attachments: HBASE-4916.D741.1.patch


 Add a script to start a streaming map-reduce job where each map tasks runs an 
 instance of the load tester for a partition of the key-space. Ensure that the 
 load tester takes a parameter indicating the start key for write operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173876#comment-13173876
 ] 

stack commented on HBASE-5078:
--

Yeah, in-series with other stuff going on... (its too long!)

 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173879#comment-13173879
 ] 

Hudson commented on HBASE-5077:
---

Integrated in HBase-TRUNK #2565 (See 
[https://builds.apache.org/job/HBase-TRUNK/2565/])
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS -- fix 
compile error
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077-v4.txt, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173880#comment-13173880
 ] 

Hudson commented on HBASE-5021:
---

Integrated in HBase-TRUNK #2565 (See 
[https://builds.apache.org/job/HBase-TRUNK/2565/])
HBASE-5021 Enforce upper bound on timestamp

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java


 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5021-addendum.txt, D849.1.patch, D849.2.patch, 
 D849.3.patch, HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173881#comment-13173881
 ] 

Hudson commented on HBASE-5078:
---

Integrated in HBase-TRUNK #2565 (See 
[https://builds.apache.org/job/HBase-TRUNK/2565/])
HBASE-5078 DistributedLogSplitter failing to split file because it has 
edits for lots of regions
HBASE-5078 DistributedLogSplitter failing to split file because it has edits 
for lots of regions

stack : 
Files : 
* /hbase/trunk/CHANGES.txt

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4698:
-

   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thanks for the patch Liyin.

 Let the HFile Pretty Printer print all the key values for a specific row.
 -

 Key: HBASE-4698
 URL: https://issues.apache.org/jira/browse/HBASE-4698
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, 
 D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch


 When using HFile Pretty Printer to debug HBase issues, 
 it would very nice to allow the Pretty Printer to seek to a specific row, and 
 only print all the key values for this row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-20 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173888#comment-13173888
 ] 

Phabricator commented on HBASE-5033:


lhofhansl has commented on the revision 
[jira][HBASE-5033][[89-fb]]Opening/Closing store in parallel to reduce region 
open/close time.

  The last proposal make more sense. The number of threads for opening store 
files within a number of stores opened in parallel would be hard to grok in a 
real cluster.

REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5070) Constraints implementation and javadoc changes

2011-12-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173890#comment-13173890
 ] 

jirapos...@reviews.apache.org commented on HBASE-5070:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3273/#review4034
---

Ship it!


This patch should go in.  Good improvements.  Doesn't address the bigger issues 
of Configuration inline w/ HTD -- have you tried it, I mean, IIRC, though you 
might have one custom config only, the whole Configuration will be output per 
Constraint? -- and adding support to shell.  Those are in different issues?


src/docbkx/book.xml
https://reviews.apache.org/r/3273/#comment9139

Only thing missing is since 0.94 which is when constraints will show up.



src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java
https://reviews.apache.org/r/3273/#comment9140

Better


- Michael


On 2011-12-20 19:14:46, Jesse Yates wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3273/
bq.  ---
bq.  
bq.  (Updated 2011-12-20 19:14:46)
bq.  
bq.  
bq.  Review request for hbase, Gary Helmling, Ted Yu, and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Follow-up on changes to constraint as per stack's comments on HBASE-4605.
bq.  
bq.  
bq.  This addresses bug HBASE-5070.
bq.  https://issues.apache.org/jira/browse/HBASE-5070
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/docbkx/book.xml bd3f881 
bq.src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java 
7ce6d45 
bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java 2d8b4d7 
bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java 
7825466 
bq.src/main/java/org/apache/hadoop/hbase/constraint/package-info.java 
6145ed5 
bq.
src/test/java/org/apache/hadoop/hbase/constraint/CheckConfigurationConstraint.java
 c49098d 
bq.  
bq.  Diff: https://reviews.apache.org/r/3273/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn clean test -P localTests -Dest=*Constraint* - all tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jesse
bq.  
bq.



 Constraints implementation and javadoc changes
 --

 Key: HBASE-5070
 URL: https://issues.apache.org/jira/browse/HBASE-5070
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu

 This is continuation of HBASE-4605
 See Stack's comments https://reviews.apache.org/r/2579/#review3980

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5060) HBase client is blocked forever

2011-12-20 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5060:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Was committed a few days ago.

 HBase client is blocked forever
 ---

 Key: HBASE-5060
 URL: https://issues.apache.org/jira/browse/HBASE-5060
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Critical
 Fix For: 0.92.0, 0.90.6

 Attachments: HBASE-5060_Branch90trial.patch, HBASE-5060_trunk.patch


 Since the client had a temporary network failure, After it recovered.
 I found my client thread was blocked. 
 Looks below stack and logs, It said that we use a invalid CatalogTracker in 
 function tableExists.
 Block stack:
 WriteHbaseThread33 prio=10 tid=0x7f76bc27a800 nid=0x2540 in 
 Object.wait() [0x7f76af4f3000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
  - locked 0x7f7a67817c98 (a 
 java.util.concurrent.atomic.AtomicBoolean)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
  at 
 org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:427)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:164)
  at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
 Source)
  at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
  - locked 0x7f7a4c5dc578 (a com.huawei.hdi.hbase.HbaseReOper)
  at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
  at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
 In ZooKeeperNodeTracker, We don't throw the KeeperException to high level.
 So in CatalogTracker level, We think ZooKeeperNodeTracker start success and
 continue to process .
 [WriteHbaseThread33]2011-12-16 17:07:33,153[WARN ]  | 
 hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Unable to 
 get data of znode /hbase/root-region-server | 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:557)
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/root-region-server
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
  at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
  at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
  at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
 Source)
  at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
  at com.huawei.hdi.hbase.HbaseFileOperate.writeToHbase(Unknown Source)
  at com.huawei.hdi.hbase.WriteHbaseThread.run(Unknown Source)
 [WriteHbaseThread33]2011-12-16 17:07:33,361[ERROR]  | 
 hconnection-0x334129cf6890051-0x334129cf6890051-0x334129cf6890051 Received 
 unexpected KeeperException, re-throwing exception | 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.keeperException(ZooKeeperWatcher.java:385)
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/root-region-server
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
  at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:931)
  at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
  at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:73)
  at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:136)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:111)
  at 
 org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:162)
  at com.huawei.hdi.hbase.HbaseFileOperate.checkHtableState(Unknown 
 Source)
  at com.huawei.hdi.hbase.HbaseReOper.reCreateHtable(Unknown Source)
 

[jira] [Commented] (HBASE-4099) Authentication for ThriftServer clients

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173897#comment-13173897
 ] 

stack commented on HBASE-4099:
--

These have been committed?

 Authentication for ThriftServer clients
 ---

 Key: HBASE-4099
 URL: https://issues.apache.org/jira/browse/HBASE-4099
 Project: HBase
  Issue Type: Sub-task
  Components: security
Reporter: Gary Helmling
 Attachments: HBASE-4099.patch


 The current implementation of HBase client authentication only works with the 
 Java API.  Alternate access gateways, like Thrift and REST are left out and 
 will not work.
 For the ThriftServer to be able to fully interoperate with the security 
 implementation:
 # the ThriftServer should be able to login from a keytab file with it's own 
 server principal on startup
 # thrift clients should be able to authenticate securely when connecting to 
 the server
 # the ThriftServer should be able to act as a proxy for those clients so that 
 the RPCs it issues will be correctly authorized as the original client 
 identities
 There is already some support for step 3 in UserGroupInformation and related 
 classes.
 For step #2, we really need to look at what thrift itself supports.
 At a bare minimum, we need to implement step #1.  If we do this, even without 
 steps 2  3, this would at least allow deployments to use a ThriftServer per 
 application user, and have the server login as that user on startup.  Thrift 
 clients may not be directly authenticated, but authorization checks for HBase 
 could still be handled correctly this way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3523) Rewrite our client (client 2.0)

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173906#comment-13173906
 ] 

stack commented on HBASE-3523:
--

Breathing some life back into this issue:

Reasons for new client, updates:

+ The Jonathan Payne accounting of unaccounted off-heap socket buffers per 
thread which makes our client OOME when lots of threads (HBASE-4956)
+ complex, lots of layers, long-lived zk connection (not necessary on client?)
+ Should work against multiple versions of hbase (but that might be another 
issue, an rpc issue... this issue could be distinct from rpc fixup?)

See also Lars comment here: 
https://issues.apache.org/jira/browse/HBASE-5058?focusedCommentId=13173364page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13173364

 Rewrite our client (client 2.0)
 ---

 Key: HBASE-3523
 URL: https://issues.apache.org/jira/browse/HBASE-3523
 Project: HBase
  Issue Type: Brainstorming
Reporter: stack

 Is it just me or do others sense that there is pressure building to redo the 
 client?  If just me, ignore the below... I'll just keep notes in here.  
 Otherwise, what would the requirements for a client rewrite look like?
 + Let out InterruptedException
 + Enveloping of messages or space for metadata that can be passed by client 
 to server and by server to client; e.g. the region a.b.c moved to server 
 x.y.z. or scanner is finished or timeout
 + A different RPC? One with tighter serialization.
 + More sane timeout/retry policy.
 Does it have to support async communication?  Do callbacks?
 What else?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5058) Allow HBaseAmin to use an existing connection

2011-12-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173908#comment-13173908
 ] 

stack commented on HBASE-5058:
--

@Lars We already have an issue, hbase-3523 Rewrite our client (client 2.0)

 Allow HBaseAmin to use an existing connection
 -

 Key: HBASE-5058
 URL: https://issues.apache.org/jira/browse/HBASE-5058
 Project: HBase
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5058-v2.txt, 5058-v3.txt, 5058-v3.txt, 5058.txt


 What HBASE-4805 does for HTables, this should do for HBaseAdmin.
 Along with this the shared error handling and retrying between HBaseAdmin and 
 HConnectionManager can also be improved. I'll attach a first pass patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-20 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173914#comment-13173914
 ] 

Mubarak Seyed commented on HBASE-4720:
--

Thanks Stack.

bq. Suggest you not do stuff like below in future because it bloats your patch 
making it more susceptible to rot and besides is not related directly to what 
you are trying to your fix so distracting for reviewers (for the next time):
I had formatted the code using 
[-HBASE-3678-|https://issues.apache.org/jira/browse/HBASE-3678] 
{{eclipse_formatter_apache.xml}}. I apologize for messing up with format. Can 
you please advice on code formatting? (i believe hbase book also refers 
HBASE-3678 for code formatting)

bq. Is this safe? e.g. what if row has binary characters in it? Should these be 
base64'd or something?
Other test methods in 
{{src/test/java/org/apache/hadoop/hbase/rest/TestRowResources.java}} uses the 
same way to build the URI ({{deleteRow, deleteValue, putValuePB,}} etc). I just 
copied the code from other methods.

bq. You have some lines that are way too long.
{{eclipse-code-formatter.xml}} uses line length as 80, please advice on line 
length. 

{code}
setting id=org.eclipse.jdt.core.formatter.comment.line_length value=80/
{code}

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient

2011-12-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173917#comment-13173917
 ] 

Lars Hofhansl commented on HBASE-4956:
--

Just checked the HTable code in trunk... Puts (even single put operations) 
already go through a thread from the HTable's thread pool. A single get request 
is issuing a request directly, but get(ListGet) is also using the pool.
So limiting the number of threads in a single thread pool and using the new 
HTable constructor from HBASE-4805, this can be controlled. Need to make one is 
using only get(ListGet) in addition to a relatively small global threadpool.
At Salesforce we'll shoot for pool with ~100 threads and a waiting queue of 
about 100 as well (for very large clusters that may be too small though).


 Control direct memory buffer consumption by HBaseClient
 ---

 Key: HBASE-4956
 URL: https://issues.apache.org/jira/browse/HBASE-4956
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Yu

 As Jonathan explained here 
 https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1
  , standard hbase client inadvertently consumes large amount of direct memory.
 We should consider using netty for NIO-related tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173933#comment-13173933
 ] 

Hudson commented on HBASE-5077:
---

Integrated in HBase-0.92 #205 (See 
[https://builds.apache.org/job/HBase-0.92/205/])
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS -- fix 
compile error
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS

stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077-v4.txt, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173934#comment-13173934
 ] 

Hudson commented on HBASE-5078:
---

Integrated in HBase-0.92 #205 (See 
[https://builds.apache.org/job/HBase-0.92/205/])
HBASE-5078 DistributedLogSplitter failing to split file because it has 
edits for lots of regions

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




<    1   2