[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173982#comment-13173982
 ] 

Hudson commented on HBASE-5078:
---

Integrated in HBase-0.92-security #47 (See 
[https://builds.apache.org/job/HBase-0.92-security/47/])
HBASE-5078 DistributedLogSplitter failing to split file because it has 
edits for lots of regions

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173981#comment-13173981
 ] 

Hudson commented on HBASE-5077:
---

Integrated in HBase-0.92-security #47 (See 
[https://builds.apache.org/job/HBase-0.92-security/47/])
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS -- fix 
compile error
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS

stack : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077-v4.txt, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173984#comment-13173984
 ] 

nkeywal commented on HBASE-5064:


#554 (No //, min number of threads)
Total time: 1:55:37.351s
Tests run: 781, Failures: 3, Errors: 2, Skipped: 9

Too many open files
TestInstantSchemaChange.testInstantSchemaJanitor

NumberFormatException
TestTableMapReduce.testMultiRegionTable
TestHFileOutputFormat.testMRIncrementalLoad
TestHFileOutputFormat.testMRIncrementalLoadWithSplit
TestHFileOutputFormat.testExcludeMinorCompaction

Hung
TestReplication

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v6.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5082) ColumnAggregationEndPoint- Causes null pointer in RS when we pass null column qualifier

2011-12-21 Thread ramkrishna.s.vasudevan (Created) (JIRA)
ColumnAggregationEndPoint- Causes null pointer in RS when we pass null column 
qualifier
---

 Key: HBASE-5082
 URL: https://issues.apache.org/jira/browse/HBASE-5082
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Minor


I was trying to use the ColumnAggregationEndPoint.sum().
 
In my sample is just created a column family but did not use any qualifier and 
inserted some data.
 
I tried to use  ColumnAggregationEndPoint.sum(qualifier, null).  When i did 
this inside the ColumnAggregationEndPoint we do 
scan.addColumn().  This is adding the [null] array in the scan object.  Later 
in the scanQueryMatcher it is throwing nullpointer exception.  
I can understand that addColumn() is to specifiy the qualifier.
Do we need to document somewhere saying qualifier should not be null? I think 
coprocessors can be used even in places where we don't have qualifiers. If that 
is the case this sample ColumnAggregationEndPoint may not work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174036#comment-13174036
 ] 

Hudson commented on HBASE-4698:
---

Integrated in HBase-TRUNK-security #39 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/39/])
HBASE-4698 Let the HFile Pretty Printer print all the key values for a 
specific row; REMOVE OVER-COMMIT, STUFF NOT IN 4698 PATCH
HBASE-4698 Let the HFile Pretty Printer print all the key values for a specific 
row

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 Let the HFile Pretty Printer print all the key values for a specific row.
 -

 Key: HBASE-4698
 URL: https://issues.apache.org/jira/browse/HBASE-4698
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, 
 D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch


 When using HFile Pretty Printer to debug HBase issues, 
 it would very nice to allow the Pretty Printer to seek to a specific row, and 
 only print all the key values for this row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5072) Support Max Value for Per-Store Metrics

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174037#comment-13174037
 ] 

Hudson commented on HBASE-5072:
---

Integrated in HBase-TRUNK-security #39 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/39/])
[jira] [HBASE-5072] Support Max Value for Per-Store Metrics

Summary:
We were bit in our multi-tenant cluster because one of our Stores
encountered a bug and grew its StoreFile count. We didn't notice this because
the StoreFile count currently reported by the RegionServer is an average of all
Stores in the region. For the per-Store metrics, we should also record the max
so we can notice outliers.

Test Plan: - mvn test -Dtest=TestRegionServerMetrics

Reviewers: JIRA, mbautin, Kannan

Reviewed By: Kannan

CC: stack, nspiegelberg, mbautin, Kannan

Differential Revision: 945

nspiegelberg : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java


 Support Max Value for Per-Store Metrics
 ---

 Key: HBASE-5072
 URL: https://issues.apache.org/jira/browse/HBASE-5072
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.94.0

 Attachments: D945.1.patch, D945.2.patch, HBASE-5072.patch


 We were bit in our multi-tenant cluster because one of our Stores encountered 
 a bug and grew its StoreFile count.  We didn't notice this because the 
 StoreFile count currently reported by the RegionServer is an average of all 
 Stores in the region.  For the per-Store metrics, we should also record the 
 max so we can notice outliers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5077) SplitLogWorker fails to let go of a task, kills the RS

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174035#comment-13174035
 ] 

Hudson commented on HBASE-5077:
---

Integrated in HBase-TRUNK-security #39 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/39/])
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS -- fix 
compile error
HBASE-5077 SplitLogWorker fails to let go of a task, kills the RS

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 SplitLogWorker fails to let go of a task, kills the RS
 --

 Key: HBASE-5077
 URL: https://issues.apache.org/jira/browse/HBASE-5077
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-5077-v2.patch, HBASE-5077-v4.txt, HBASE-5077.patch


 I hope I didn't break spacetime continuum, I got this while testing 0.92.0:
 {quote}
 2011-12-20 03:06:19,838 FATAL 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: logic error - end task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
  done failed because task doesn't exist
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A9100%2Fhbase%2F.logs%2Fsv4r14s38%2C62023%2C1324345935047-splitting%2Fsv4r14s38%252C62023%252C1324345935047.1324349363814
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1228)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:654)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.endTask(SplitLogWorker.java:372)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:280)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
 at 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 I'll post more logs in a moment. What I can see is that the master shuffled 
 that task around a bit and one of the region servers died on this stack trace 
 while the others were able to interrupt themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5078) DistributedLogSplitter failing to split file because it has edits for lots of regions

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174039#comment-13174039
 ] 

Hudson commented on HBASE-5078:
---

Integrated in HBase-TRUNK-security #39 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/39/])
HBASE-5078 DistributedLogSplitter failing to split file because it has 
edits for lots of regions
HBASE-5078 DistributedLogSplitter failing to split file because it has edits 
for lots of regions

stack : 
Files : 
* /hbase/trunk/CHANGES.txt

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 DistributedLogSplitter failing to split file because it has edits for lots of 
 regions
 -

 Key: HBASE-5078
 URL: https://issues.apache.org/jira/browse/HBASE-5078
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5078-v2.txt, 5078-v3.txt, 5078-v4.txt, 5078.txt


 Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
 many regions and just opening the file per region was taking so long, we were 
 never updating our progress and so the split of the log just kept failing; in 
 this case, the first 40 edits in a file required our opening 35 files -- 
 opening 35 files took longer than the hard-coded 25 seconds its supposed to 
 take acquiring the task.
 First, here is master's view:
 {code}
 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  ver = 0
 ...
 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
 task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  acquired by sv4r27s44,7003,1324365396664
 ...
 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
 task not yet acquired 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
  ver = 3
 {code}
 Master then gives it elsewhere.
 Over on the regionserver we see:
 {code}
 2011-12-20 17:54:09,233 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
 sv4r27s44,7003,1324365396664 acquired task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 2011-12-20 17:54:10,714 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
 Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
 its/0278862.temp, syncFs=true, hflush=false
 
 {code}
  and so on till:
 {code}
 2011-12-20 17:54:36,876 INFO 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
 /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
  preempted from sv4r27s44,7003,1324365396664, current task state and 
 owner=owned sv4r28s44,7003,1324365396678
 
 2011-12-20 17:54:37,112 WARN 
 org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
 task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 
 {code}
 When above happened, we'd only processed 40 edits.  As written, we only 
 heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-21 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174077#comment-13174077
 ] 

ramkrishna.s.vasudevan commented on HBASE-5009:
---

@Ted
Not sure how long it may take.. but it should not be too long..  But this will 
ensure that the background threads are completed.  If not we may get more such 
errors
{code}
org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/hbase/Htable_UFDR_015/914f3eff1aa7649b7a99b8ce86f3f39c/splits/42423e22baa0c02897a27f92a265f2ad/value/1480356236187113033.914f3eff1aa7649b7a99b8ce86f3f39c
 File does not exist. [Lease.  Holder: 
DFSClient_hb_rs_193-195-18-165,20020,1324435953313_1324435953584_377812698_27, 
pendingcreates: 1]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1786)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1777)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1834)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1820)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:738)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1116)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1112)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1110)

at org.apache.hadoop.ipc.Client.call(Client.java:954)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
at $Proxy6.complete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker.invokeMethod(RPCRetryAndSwitchInvoker.java:183)
at 
com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker.invokeMethod(RPCRetryAndSwitchInvoker.java:171)
at 
com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker.invoke(RPCRetryAndSwitchInvoker.java:76)
at $Proxy6.complete(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:4307)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:4211)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:63)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:88)
at org.apache.hadoop.hbase.io.Reference.write(Reference.java:133)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.split(StoreFile.java:651)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.splitStoreFile(SplitTransaction.java:498)
{code}


 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174080#comment-13174080
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508218/5064.v6.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/565//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/565//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/565//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v6.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174089#comment-13174089
 ] 

nkeywal commented on HBASE-5064:


#565 (No //, min number of threads)
Total time: 1:59:24.242s
Tests run: 788, Failures: 3, Errors: 3, Skipped: 9

Too many open files
TestInstantSchemaChange.testInstantSchemaJanitor

NumberFormatException
TestTableMapReduce.testMultiRegionTable
TestHFileOutputFormat.testMRIncrementalLoad
TestHFileOutputFormat.testMRIncrementalLoadWithSplit
TestHFileOutputFormat.testExcludeMinorCompaction

Hung
None it seems, and at it least not TestReplication

The directory is already locked.
TestLogRolling.testLogRollOnDatanodeDeath


 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174116#comment-13174116
 ] 

Hudson commented on HBASE-4698:
---

Integrated in HBase-TRUNK #2566 (See 
[https://builds.apache.org/job/HBase-TRUNK/2566/])
HBASE-4698 Let the HFile Pretty Printer print all the key values for a 
specific row; REMOVE OVER-COMMIT, STUFF NOT IN 4698 PATCH
HBASE-4698 Let the HFile Pretty Printer print all the key values for a specific 
row

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java

stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java


 Let the HFile Pretty Printer print all the key values for a specific row.
 -

 Key: HBASE-4698
 URL: https://issues.apache.org/jira/browse/HBASE-4698
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, 
 D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch


 When using HFile Pretty Printer to debug HBase issues, 
 it would very nice to allow the Pretty Printer to seek to a specific row, and 
 only print all the key values for this row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174131#comment-13174131
 ] 

stack commented on HBASE-5064:
--

So above tests are w/o //?  (I'm getting no response from Giri)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-21 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5009:
--

Status: Patch Available  (was: Open)

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5009) Failure of creating split dir if it already exists prevents splits from happening further

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174164#comment-13174164
 ] 

Hadoop QA commented on HBASE-5009:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12507675/HBASE-5009.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 75 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/567//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/567//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/567//console

This message is automatically generated.

 Failure of creating split dir if it already exists prevents splits from 
 happening further
 -

 Key: HBASE-5009
 URL: https://issues.apache.org/jira/browse/HBASE-5009
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5009.patch, HBASE-5009_Branch90.patch


 The scenario is
 - The split of a region takes a long time
 - The deletion of the splitDir fails due to HDFS problems.
 - Subsequent splits also fail after that.
 {code}
 private static void createSplitDir(final FileSystem fs, final Path splitdir)
   throws IOException {
 if (fs.exists(splitdir)) throw new IOException(Splitdir already exits?  
 + splitdir);
 if (!fs.mkdirs(splitdir)) throw new IOException(Failed create of  + 
 splitdir);
   }
 {code}
 Correct me if am wrong? If it is an issue can we change the behaviour of 
 throwing exception?
 Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-21 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4951:
--

Attachment: HBASE-4951_branch.patch

Patch for branch0.90

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.6

 Attachments: HBASE-4951.patch, HBASE-4951_branch.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174170#comment-13174170
 ] 

Hadoop QA commented on HBASE-4951:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508251/HBASE-4951_branch.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/568//console

This message is automatically generated.

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.6

 Attachments: HBASE-4951.patch, HBASE-4951_branch.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174180#comment-13174180
 ] 

Jimmy Xiang commented on HBASE-5081:


I am working on a path now. I think synchronous deleteNode is clean.  It will 
give retry a fresh start.
But it may take a while if there are too many files.  Yes, for long term, we 
can think about how to do what stack says.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174187#comment-13174187
 ] 

Jimmy Xiang commented on HBASE-5081:


Can we deleteNode only if it is successfully done?  If it is not completed, let 
the node stay there.  In this case, when the retry happens, it should see the 
old node there, but it is ok.  The new task in the hashmap won't be deleted 
either.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Status: Patch Available  (was: Open)

This is a patch for 92.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174208#comment-13174208
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/
---

Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.


Summary
---

In this patch, after a task is done, we don't delete the node if the task is 
failed.  So that when it's retried later on, there won't be race problem.

It used to delete the node always.


This addresses bug HBASE-5081.
https://issues.apache.org/jira/browse/HBASE-5081


Diffs
-

  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7b7316f 

Diff: https://reviews.apache.org/r/3292/diff


Testing
---

mvn -Dtest=TestDistributedLogSplitting clean test


Thanks,

Jimmy



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174207#comment-13174207
 ] 

nkeywal commented on HBASE-5064:


#566 (No //, min number of threads)
Total time: 1:56:55.198s
Tests run: 789, Failures: 3, Errors: 2, Skipped: 9

Too many open files
TestInstantSchemaChange.testInstantSchemaJanitor

NumberFormatException
TestTableMapReduce.testMultiRegionTable
TestHFileOutputFormat.testMRIncrementalLoad
TestHFileOutputFormat.testMRIncrementalLoadWithSplit
TestHFileOutputFormat.testExcludeMinorCompaction

Hung
None

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174209#comment-13174209
 ] 

Zhihong Yu commented on HBASE-4720:
---

I posted comment on review board.

@Andrew:
Can you take a look at the latest patch ?

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174214#comment-13174214
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/
---

(Updated 2011-12-21 17:01:22.901024)


Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.


Changes
---

Updated the comments.


Summary
---

In this patch, after a task is done, we don't delete the node if the task is 
failed.  So that when it's retried later on, there won't be race problem.

It used to delete the node always.


This addresses bug HBASE-5081.
https://issues.apache.org/jira/browse/HBASE-5081


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7b7316f 

Diff: https://reviews.apache.org/r/3292/diff


Testing
---

mvn -Dtest=TestDistributedLogSplitting clean test


Thanks,

Jimmy



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4916) LoadTest MR Job

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174218#comment-13174218
 ] 

Phabricator commented on HBASE-4916:


stack has commented on the revision HBASE-4916 [jira] LoadTest MR Job.

  Looking back at PE, here is what it has that this tool doesn't and in some 
cases might be hard to add:

  + Runs against minicluster (Should be easy to add)
  + Can sequential read/writes (Not so important, though sequential write can 
be interesting)
  + Can do scan tests; random 10 items, random 100, etc.

  Stuff this tool can do that would need to be added to PE:

  + PE needs to do more work per mapper; at moment doing dumb stuff like 
running 24 mappers per TT just to get the load up
  + Better load profiles; PE loading is regular

REVISION DETAIL
  https://reviews.facebook.net/D741


 LoadTest MR Job
 ---

 Key: HBASE-4916
 URL: https://issues.apache.org/jira/browse/HBASE-4916
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Christopher Gist
 Fix For: 0.94.0

 Attachments: HBASE-4916.D741.1.patch


 Add a script to start a streaming map-reduce job where each map tasks runs an 
 instance of the load tester for a partition of the key-space. Ensure that the 
 load tester takes a parameter indicating the start key for write operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Attachment: patch_for_92_v2.txt

Updated the comments.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt, patch_for_92_v2.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v8.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v8.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v8.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v8.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174223#comment-13174223
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4046
---



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9151

White space on end and 'is' should be 'if'


- Michael


On 2011-12-21 17:01:22, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 17:01:22)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7b7316f 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt, patch_for_92_v2.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174224#comment-13174224
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4045
---

Ship it!


Looks good to me.  How I know it works?  Would it be hard to write a test?

- Michael


On 2011-12-21 17:01:22, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 17:01:22)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7b7316f 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt, patch_for_92_v2.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174225#comment-13174225
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--



bq.  On 2011-12-21 17:06:22, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
358
bq.   https://reviews.apache.org/r/3292/diff/2/?file=65660#file65660line358
bq.  
bq.   White space on end and 'is' should be 'if'

Let me fix it.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4046
---


On 2011-12-21 17:01:22, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 17:01:22)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7b7316f 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt, patch_for_92_v2.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174228#comment-13174228
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/
---

(Updated 2011-12-21 17:12:36.185307)


Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.


Changes
---

Fixed the comments.


Summary
---

In this patch, after a task is done, we don't delete the node if the task is 
failed.  So that when it's retried later on, there won't be race problem.

It used to delete the node always.


This addresses bug HBASE-5081.
https://issues.apache.org/jira/browse/HBASE-5081


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7b7316f 

Diff: https://reviews.apache.org/r/3292/diff


Testing
---

mvn -Dtest=TestDistributedLogSplitting clean test


Thanks,

Jimmy



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Attachment: patch_for_92_v3.txt

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174234#comment-13174234
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4050
---


Interesting idea. Let us know result of testing in real cluster.
Since the zk node may live across cluster restart, we should verify the change 
works in that scenario.


src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9158

I think the new comments should be lifted to line 353 for clarity.



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9156

Should read 'if the task failed'



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9157

Indentation should be 2 spaces.


- Ted


On 2011-12-21 17:12:36, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 17:12:36)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7b7316f 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 patch_for_92.txt, patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174236#comment-13174236
 ] 

Zhihong Yu commented on HBASE-4720:
---

@Mubarak:
I think the rule you posted only governs comment.
I tried auto-formatting a long line which has 3 identical short statements. 
There was no auto-wrapping.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4916) LoadTest MR Job

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174249#comment-13174249
 ] 

Phabricator commented on HBASE-4916:


tedyu has commented on the revision HBASE-4916 [jira] LoadTest MR Job.

  Nice tool.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/loadtest/GetGenerator.java:81 Should be 
seeded.
  src/main/java/org/apache/hadoop/hbase/loadtest/RandomDataGenerator.java:94 
Please seed as what is done at line 61.
  src/main/java/org/apache/hadoop/hbase/loadtest/KeyCounter.java:103 I think we 
should add some message to NoKeysException so that this scenario can be 
distinguished from that of line 90.

REVISION DETAIL
  https://reviews.facebook.net/D741


 LoadTest MR Job
 ---

 Key: HBASE-4916
 URL: https://issues.apache.org/jira/browse/HBASE-4916
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Christopher Gist
 Fix For: 0.94.0

 Attachments: HBASE-4916.D741.1.patch


 Add a script to start a streaming map-reduce job where each map tasks runs an 
 instance of the load tester for a partition of the key-space. Ensure that the 
 load tester takes a parameter indicating the start key for write operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174250#comment-13174250
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/
---

(Updated 2011-12-21 17:40:59.028814)


Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.


Changes
---

Fixed the indentation issue, added more comment.


Summary
---

In this patch, after a task is done, we don't delete the node if the task is 
failed.  So that when it's retried later on, there won't be race problem.

It used to delete the node always.


This addresses bug HBASE-5081.
https://issues.apache.org/jira/browse/HBASE-5081


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 667a8b1 

Diff: https://reviews.apache.org/r/3292/diff


Testing
---

mvn -Dtest=TestDistributedLogSplitting clean test


Thanks,

Jimmy



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174251#comment-13174251
 ] 

Jimmy Xiang commented on HBASE-5081:


The patch is for both 0.92 and 0.94 actually.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174252#comment-13174252
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4052
---



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9159

Should read 'Asynchronously deleting'



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9160

Should read 'if the task failed and was not an orphan'


- Ted


On 2011-12-21 17:40:59, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 17:40:59)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
667a8b1 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174254#comment-13174254
 ] 

Zhihong Yu commented on HBASE-5064:
---

Build #566 looks good.
Shall we increase parallelism and run test suite again ?

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v8.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-21 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4951:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508251/HBASE-4951_branch.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/568//console

This message is automatically generated.)

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.6

 Attachments: HBASE-4951.patch, HBASE-4951_branch.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5083) Backup HMaster should have http infoport open with link to the active master

2011-12-21 Thread Jonathan Hsieh (Created) (JIRA)
Backup HMaster should have http infoport open with link to the active master


 Key: HBASE-5083
 URL: https://issues.apache.org/jira/browse/HBASE-5083
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh


Without ssh'ing and jps/ps'ing, it is difficult to see if a backup hmaster is 
up.  It seems like it would be good for a backup hmaster to have a basic web 
page up on the info port so that users could see that it is up.  Also it should 
probably either provide a link to the active master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174259#comment-13174259
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508264/patch_for_92_v2.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/570//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/570//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/570//console

This message is automatically generated.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174260#comment-13174260
 ] 

Zhihong Yu commented on HBASE-4951:
---

Minor comments:
{code}
+if (LOG.isTraceEnabled()) {
+  LOG.info(.META. still not available, sleeping and retrying. +
{code}
Should LOG.isInfoEnabled() be called above ?

The following two comments apply to ZooKeeperNodeTracker.java and 
CatalogTracker.java:
{code}
+// chk if the shutdown node is available. Because incase of cluster
+// shutdonw
+// this loop prevents the master from going down.
{code}
The above should read: check if the shutdown node exists. In case of cluster 
shutdown, ...
{code}
+  Unexpected exception handling while checking if shutdown node 
exists.,
{code}
Extra word above: handling

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.6

 Attachments: HBASE-4951.patch, HBASE-4951_branch.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Attachment: hbase-5081_patch_v5.txt

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174261#comment-13174261
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/
---

(Updated 2011-12-21 18:06:55.460515)


Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.


Changes
---

Changed some comments


Summary
---

In this patch, after a task is done, we don't delete the node if the task is 
failed.  So that when it's retried later on, there won't be race problem.

It used to delete the node always.


This addresses bug HBASE-5081.
https://issues.apache.org/jira/browse/HBASE-5081


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 667a8b1 

Diff: https://reviews.apache.org/r/3292/diff


Testing
---

mvn -Dtest=TestDistributedLogSplitting clean test


Thanks,

Jimmy



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174262#comment-13174262
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--



bq.  On 2011-12-21 17:44:53, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
368
bq.   https://reviews.apache.org/r/3292/diff/4/?file=65674#file65674line368
bq.  
bq.   Should read 'if the task failed and was not an orphan'

The original is fine.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4052
---


On 2011-12-21 18:06:55, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 18:06:55)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
667a8b1 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5021) Enforce upper bound on timestamp

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174271#comment-13174271
 ] 

Phabricator commented on HBASE-5021:


Kannan has commented on the revision [jira] [HBase-5021] Enforce upper bound 
on timestamp.

  Liyin reported that the TestHeapSize didn't pass in the daily builds.

REVISION DETAIL
  https://reviews.facebook.net/D849

COMMIT
  https://reviews.facebook.net/rHBASE1221532


 Enforce upper bound on timestamp
 

 Key: HBASE-5021
 URL: https://issues.apache.org/jira/browse/HBASE-5021
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5021-addendum.txt, D849.1.patch, D849.2.patch, 
 D849.3.patch, HBASE-5021-trunk.patch


 We have been getting hit with performance problems on our time-series 
 database due to invalid timestamps being inserted by the timestamp.  We are 
 working on adding proper checks to app server, but production performance 
 could be severely impacted with significant recovery time if something slips 
 past.  Since timestamps are considered a fundamental part of the HBase schema 
  multiple optimizations use timestamp information, we should allow the 
 option to sanity check the upper bound on the server-side in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-21 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174277#comment-13174277
 ] 

nkeywal commented on HBASE-5064:


No, we need the hadoopqa setting increased before activating //.
Here, I am launching multiple tests to identify most of the random issues
we can have, this will help to distinguish what is random vs. what is due
to //.
However, I can split this patch in smaller ones:
1) one to activate the lastest version of surefire
2) one for the minimal thread settings
3) one for the // itself.

From all the tests done so far, 1  2 seem ok today.

On Wed, Dec 21, 2011 at 6:49 PM, Zhihong Yu (Commented) (JIRA) 



 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v8.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174313#comment-13174313
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508259/patch_for_92.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/569//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/569//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/569//console

This message is automatically generated.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174324#comment-13174324
 ] 

Zhihong Yu commented on HBASE-5081:
---

I can reproduce the following test failures on MacBook:
{code}
Failed tests:   testTaskErr(org.apache.hadoop.hbase.master.TestSplitLogManager)
  testUnassignedTimeout(org.apache.hadoop.hbase.master.TestSplitLogManager)
{code}
Please adjust TestSplitLogManager accordingly.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5070) Constraints implementation and javadoc changes

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174339#comment-13174339
 ] 

jirapos...@reviews.apache.org commented on HBASE-5070:
--



bq.  On 2011-12-21 06:09:56, Michael Stack wrote:
bq.   This patch should go in.  Good improvements.  Doesn't address the bigger 
issues of Configuration inline w/ HTD -- have you tried it, I mean, IIRC, 
though you might have one custom config only, the whole Configuration will be 
output per Constraint? -- and adding support to shell.  Those are in different 
issues?

I'm going to try checking out the conf stuff for the Constraints today in the 
shell. Keep in mind that the conf stored per Constraint should just be 
configuration info for the Constraint, as opposed to the entire hbase conf (so 
we don't see all the values from hbase-site.xml, etc), but by default only see 
the enabled and priority configuration values. I'm expecting that to be pretty 
small, but I'll let you know.

Also, there is HBASE-4879 to add shell support to Constraints as the original 
was already pretty massive (and taking a while) and the shell stuff could be 
cleanly separated out. I was working on it, but have temporarily diverted from 
it to work on the maven modularization (4336) as per Ted's request.


- Jesse


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3273/#review4034
---


On 2011-12-20 19:14:46, Jesse Yates wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3273/
bq.  ---
bq.  
bq.  (Updated 2011-12-20 19:14:46)
bq.  
bq.  
bq.  Review request for hbase, Gary Helmling, Ted Yu, and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Follow-up on changes to constraint as per stack's comments on HBASE-4605.
bq.  
bq.  
bq.  This addresses bug HBASE-5070.
bq.  https://issues.apache.org/jira/browse/HBASE-5070
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/docbkx/book.xml bd3f881 
bq.src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java 
7ce6d45 
bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java 2d8b4d7 
bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java 
7825466 
bq.src/main/java/org/apache/hadoop/hbase/constraint/package-info.java 
6145ed5 
bq.
src/test/java/org/apache/hadoop/hbase/constraint/CheckConfigurationConstraint.java
 c49098d 
bq.  
bq.  Diff: https://reviews.apache.org/r/3273/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn clean test -P localTests -Dest=*Constraint* - all tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jesse
bq.  
bq.



 Constraints implementation and javadoc changes
 --

 Key: HBASE-5070
 URL: https://issues.apache.org/jira/browse/HBASE-5070
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu

 This is continuation of HBASE-4605
 See Stack's comments https://reviews.apache.org/r/2579/#review3980

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-21 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5033:
---

Attachment: D933.4.patch

Liyin updated the revision [jira][HBASE-5033][[89-fb]]Opening/Closing store in 
parallel to reduce region open/close time.
Reviewers: Kannan, mbautin, Karthik, JIRA

  Implement the proposal described in the previous comments.

REVISION DETAIL
  https://reviews.facebook.net/D933

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5084) Allow different HTable instances to share one ExecutorService

2011-12-21 Thread Zhihong Yu (Created) (JIRA)
Allow different HTable instances to share one ExecutorService
-

 Key: HBASE-5084
 URL: https://issues.apache.org/jira/browse/HBASE-5084
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu


This came out of Lily 1.1.1 release:

Use a shared ExecutorService for all HTable instances, leading to better (or 
actual) thread reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-21 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174345#comment-13174345
 ] 

Mubarak Seyed commented on HBASE-4720:
--

@Ted:
{{eclipse_formatter_apache.xml}} does not work for long lines. Do you mind to 
post your eclipse formatter rule file? Thanks.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174351#comment-13174351
 ] 

Zhihong Yu commented on HBASE-4720:
---

I used the formatter provided by Nicolas.

Please adjust line length manually.

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5084) Allow different HTable instances to share one ExecutorService

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174350#comment-13174350
 ] 

Zhihong Yu commented on HBASE-5084:
---

We already have:
{code}
  public HTable(final byte[] tableName, final HConnection connection, 
  final ExecutorService pool) throws IOException {
{code}
where connection cannot be null.

Shall we relax the restriction on connection not being null ?

 Allow different HTable instances to share one ExecutorService
 -

 Key: HBASE-5084
 URL: https://issues.apache.org/jira/browse/HBASE-5084
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu

 This came out of Lily 1.1.1 release:
 Use a shared ExecutorService for all HTable instances, leading to better (or 
 actual) thread reuse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174357#comment-13174357
 ] 

Jimmy Xiang commented on HBASE-5081:


I am thinking about sync delete for the failure case.  What do you think?

I am adjusting the test failure now.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4916) LoadTest MR Job

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174365#comment-13174365
 ] 

Phabricator commented on HBASE-4916:


jdcryans has commented on the revision HBASE-4916 [jira] LoadTest MR Job.

  I tried using it on my 0.92 cluster, had to backport HexStringSplit but no 
biggie, but I encountered a few usability issues.

   - Why is -zk mandatory? It seems that by default it should use the one 
provided in HBaseConfiguration.

   - There is no help, not even for basic usage like when I tried to use it the 
first time without arguments all I got was this and it doesn't tell me what the 
switch is:

  jdcryans@sv4r11s38:~/hbase$ ./bin/hbase 
org.apache.hadoop.hbase.mapreduce.LoadTest
  ZooKeeper quorum must be specified

   - The workloads need their own documentation too, currently unless you dig 
in the code you won't know how to properly tune them.

   - VersionWorkloadGenerator and MixedWorkloadGenerator don't work with their 
default values, they both hit this in their reducers and the reason is that 
delayNS ends up at 0:

  11/12/21 19:45:43 INFO mapred.JobClient: Task Id : 
attempt_201112142134_0063_r_08_2, Status : FAILED
  java.lang.IllegalArgumentException
at 
java.util.concurrent.ScheduledThreadPoolExecutor.scheduleAtFixedRate(ScheduledThreadPoolExecutor.java:420)
at 
org.apache.hadoop.hbase.loadtest.Workload$Executor.start(Workload.java:229)
at 
org.apache.hadoop.hbase.mapreduce.LoadTest$Reduce.reduce(LoadTest.java:158)
at 
org.apache.hadoop.hbase.mapreduce.LoadTest$Reduce.reduce(LoadTest.java:124)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1154)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
   - Finally, and I'm not sure if you can do something about it, some tasks 
fail in this way on my cluster:

  11/12/21 19:45:31 INFO mapred.JobClient: Task Id : 
attempt_201112142134_0063_r_08_1, Status : FAILED
  java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
  Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)

  And in the stderr:

  Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
already in use: 8991; nested exception is:
  java.net.BindException: Address already in use

  It would be nice if it was bubbled up, but I understand if you don't have 
control over it.

REVISION DETAIL
  https://reviews.facebook.net/D741


 LoadTest MR Job
 ---

 Key: HBASE-4916
 URL: https://issues.apache.org/jira/browse/HBASE-4916
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Christopher Gist
 Fix For: 0.94.0

 Attachments: HBASE-4916.D741.1.patch


 Add a script to start a streaming map-reduce job where each map tasks runs an 
 instance of the load tester for a partition of the key-space. Ensure that the 
 load tester takes a parameter indicating the start key for write operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Status: Open  (was: Patch Available)

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Status: Patch Available  (was: Open)

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Attachment: hbase-5081-patch-v6.txt

Fixed the unit test.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174401#comment-13174401
 ] 

Phabricator commented on HBASE-4218:


tedyu has commented on the revision [jira] [HBASE-4218] HFile data block 
encoding (delta encoding).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java:42 
Should read 'have been created'
  
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java:49
 I think delta should be removed here to be consistent with new naming 
convention
  I like the javadoc in HColumnDescriptor.java @ line 601 - it is more detailed.

REVISION DETAIL
  https://reviews.facebook.net/D447


 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 D447.1.patch, D447.10.patch, D447.2.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174402#comment-13174402
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508297/hbase-5081-patch-v6.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/573//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/573//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/573//console

This message is automatically generated.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5085) fix test-patch script from setting the ulimit

2011-12-21 Thread Giridharan Kesavan (Created) (JIRA)
fix test-patch script from setting the ulimit
-

 Key: HBASE-5085
 URL: https://issues.apache.org/jira/browse/HBASE-5085
 Project: HBase
  Issue Type: Bug
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan


test-patch.sh script sets the ulimit -n 1024 just after triggering the patch 
setting this overrides the underlying systems ulimit and hence failing the 
hbase tests.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5085) fix test-patch script from setting the ulimit

2011-12-21 Thread Giridharan Kesavan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated HBASE-5085:
--

Attachment: hbase-5085.patch

patch removes the ulimit setting in the test-patch script

 fix test-patch script from setting the ulimit
 -

 Key: HBASE-5085
 URL: https://issues.apache.org/jira/browse/HBASE-5085
 Project: HBase
  Issue Type: Bug
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: hbase-5085.patch


 test-patch.sh script sets the ulimit -n 1024 just after triggering the patch 
 setting this overrides the underlying systems ulimit and hence failing the 
 hbase tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174428#comment-13174428
 ] 

Jimmy Xiang commented on HBASE-5081:


Upload patch v6 to review board.

+assertTrue(ZKUtil.checkExists(zkw, tasknode) != -1);

The original is this:

+assertTrue(ZKUtil.checkExists(zkw, tasknode) == -1);

This assertion can be satisfied when there is no race without this patch.
With this patch, we don't delete any failed task node.  So the node should be
there now.

I am still working on a patch to delete the node synchronously in this scenario.




 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5085) fix test-patch script from setting the ulimit

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174431#comment-13174431
 ] 

Zhihong Yu commented on HBASE-5085:
---

+1 on patch.
Let's see what result Hadoop QA would give us based on this patch.

 fix test-patch script from setting the ulimit
 -

 Key: HBASE-5085
 URL: https://issues.apache.org/jira/browse/HBASE-5085
 Project: HBase
  Issue Type: Bug
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: hbase-5085.patch


 test-patch.sh script sets the ulimit -n 1024 just after triggering the patch 
 setting this overrides the underlying systems ulimit and hence failing the 
 hbase tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174433#comment-13174433
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4059
---


Test change looks good to me.

- Lars


On 2011-12-21 21:30:56, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 21:30:56)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7b7316f 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
5c9d7dd 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5085) fix test-patch script from setting the ulimit

2011-12-21 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5085:
--

Status: Patch Available  (was: Open)

 fix test-patch script from setting the ulimit
 -

 Key: HBASE-5085
 URL: https://issues.apache.org/jira/browse/HBASE-5085
 Project: HBase
  Issue Type: Bug
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: hbase-5085.patch


 test-patch.sh script sets the ulimit -n 1024 just after triggering the patch 
 setting this overrides the underlying systems ulimit and hence failing the 
 hbase tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5081:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508264/patch_for_92_v2.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/570//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/570//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/570//console

This message is automatically generated.)

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174435#comment-13174435
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508273/hbase-5081_patch_v5.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/572//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/572//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/572//console

This message is automatically generated.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174438#comment-13174438
 ] 

Zhihong Yu commented on HBASE-5081:
---

Do we have a test case where zk nodes (including the failed task node) survive 
cluster restart and distributed log splitting successfully finishes after 
cluster restart ?

If we use patch v6 instead of the synchronous node deletion patch (upcoming), 
we should check the value of resubmit_threshold. If the value is 0, we should 
still delete the failed task node because there would be no resubmission.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5081:
--

Comment: was deleted

(was: 
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/
---

(Updated 2011-12-21 17:01:22.901024)


Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.


Changes
---

Updated the comments.


Summary
---

In this patch, after a task is done, we don't delete the node if the task is 
failed.  So that when it's retried later on, there won't be race problem.

It used to delete the node always.


This addresses bug HBASE-5081.
https://issues.apache.org/jira/browse/HBASE-5081


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7b7316f 

Diff: https://reviews.apache.org/r/3292/diff


Testing
---

mvn -Dtest=TestDistributedLogSplitting clean test


Thanks,

Jimmy

)

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5081:
--

Comment: was deleted

(was: 

bq.  On 2011-12-21 17:06:22, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
358
bq.   https://reviews.apache.org/r/3292/diff/2/?file=65660#file65660line358
bq.  
bq.   White space on end and 'is' should be 'if'

Let me fix it.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4046
---


On 2011-12-21 17:01:22, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-21 17:01:22)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7b7316f 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.

)

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174456#comment-13174456
 ] 

Phabricator commented on HBASE-5033:


lhofhansl has commented on the revision 
[jira][HBASE-5033][[89-fb]]Opening/Closing store in parallel to reduce region 
open/close time.

  I guess I had envisioned that we'd somehow share the same threadpool, but 
this works. LGTM.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/HConstants.java:193 I think we don't 
use camel case in these properties.

REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4439) Move ClientScanner out of HTable

2011-12-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174459#comment-13174459
 ] 

Lars Hofhansl commented on HBASE-4439:
--

I'm still interesting in committing this. I'll rebase to trunk. I'll remove the 
extra copy of the Scan in ClientScanner (but add it as a caveat in the javadoc 
instead). HTable still makes a copy of the Scan (so we can mark HBASE-1774 as a 
dup of this).

 Move ClientScanner out of HTable
 

 Key: HBASE-4439
 URL: https://issues.apache.org/jira/browse/HBASE-4439
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 4439-v1.txt, 4439.txt


 See HBASE-1935 for motivation.
 ClientScanner should be able to exist outside of HTable.
 While we're at it, we can also add an abstract client scanner to easy 
 development of new client side scanners (such as parallel scanners, or per 
 region scanners).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174460#comment-13174460
 ] 

Phabricator commented on HBASE-5033:


tedyu has commented on the revision [jira][HBASE-5033][[89-fb]]Opening/Closing 
store in parallel to reduce region open/close time.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:807 What's 
the likelihood that store opening and closing activities happen at the same 
time ?
  I suggest reusing storeOpenerThreadPool here.

REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5081:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508273/hbase-5081_patch_v5.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/572//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/572//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/572//console

This message is automatically generated.)

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4916) LoadTest MR Job

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174473#comment-13174473
 ] 

Phabricator commented on HBASE-4916:


jdcryans has commented on the revision HBASE-4916 [jira] LoadTest MR Job.

  Doing further testing, I got this while running an online merge (that took 
1721 seconds to run because it counts parent regions):

  Exception in thread pool-2-thread-186 java.lang.NullPointerException
at 
org.apache.hadoop.hbase.loadtest.RandomDataGenerator.verify(RandomDataGenerator.java:106)
at 
org.apache.hadoop.hbase.loadtest.GetOperation.postAction(GetOperation.java:45)
at org.apache.hadoop.hbase.loadtest.Operation.run(Operation.java:126)
at 
org.apache.hadoop.hbase.loadtest.Workload$Executor$BoundedExecutor$1.run(Workload.java:269)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

REVISION DETAIL
  https://reviews.facebook.net/D741


 LoadTest MR Job
 ---

 Key: HBASE-4916
 URL: https://issues.apache.org/jira/browse/HBASE-4916
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
Assignee: Christopher Gist
 Fix For: 0.94.0

 Attachments: HBASE-4916.D741.1.patch


 Add a script to start a streaming map-reduce job where each map tasks runs an 
 instance of the load tester for a partition of the key-space. Ensure that the 
 load tester takes a parameter indicating the start key for write operations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1774) HTable$ClientScanner modifies its input parameters

2011-12-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174481#comment-13174481
 ] 

Lars Hofhansl commented on HBASE-1774:
--

Unfortunately making a copy is no longer possible in trunk because of 
ScanMetrics, which are collected on the Scan object itself. If we make a copy 
the copy will have its metrics populated. Hrrmmmfff.

Could keep a reference to copied copied, but then the application may have to 
walk a chain of these references. I think we should just document this at this 
point. I'm in this code right now anyway (HBASE-4439)... I'll add JavaDoc for 
this.

 HTable$ClientScanner modifies its input parameters
 --

 Key: HBASE-1774
 URL: https://issues.apache.org/jira/browse/HBASE-1774
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
Priority: Critical

 HTable$ClientScanner modifies the Scan that is passed to it on construction.
 I would consider this to be bad programming practice because if I wanted to 
 use the same Scan object to scan multiple tables, I would not expect one 
 table scan to effect the other, but it does.
 If input parameters are going to be modified either now or later it should be 
 called out *loudly* in the javadoc. The only way I found this behavior was by 
 creating an application that did scan multiple tables using the same Scan 
 object and having 'wierd stuff' happen.
 In my opinion, if you want to modify a field in an input parameter, you 
 should:
 - make a copy of the original object
 - optionally return a reference to the copy.
 There is no javadoc about this behavior. The only thing I found was a comment 
 in HTable$ClientScanner:
 {code}
 // HEADSUP: The scan internal start row can change as we move through 
 table.
 {code}
 Is there a use case that requires this behavior? If so, I would recommend 
 that ResultScanner  (and the classes that implement it) provide an accessor 
 to the mutable copy of the input Scan and leave the input argument alone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4439) Move ClientScanner out of HTable

2011-12-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174486#comment-13174486
 ] 

Lars Hofhansl commented on HBASE-4439:
--

See my comment in HBASE-1774. Turns out as it stands Scan objects cannot be 
copied as the ScanMetrics are collected into the Scan object.

 Move ClientScanner out of HTable
 

 Key: HBASE-4439
 URL: https://issues.apache.org/jira/browse/HBASE-4439
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 4439-v1.txt, 4439.txt


 See HBASE-1935 for motivation.
 ClientScanner should be able to exist outside of HTable.
 While we're at it, we can also add an abstract client scanner to easy 
 development of new client side scanners (such as parallel scanners, or per 
 region scanners).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server

2011-12-21 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174517#comment-13174517
 ] 

Mubarak Seyed commented on HBASE-4720:
--

I addressed most of the review comments, will wait until get more review 
comments from Andrew (and others).

{code}
parameter update is not being used in other places as well.

RowResoure.java:

Response update(final CellSetModel model, final boolean replace)

Response updateBinary(final byte[] message, final HttpHeaders headers,
  final boolean replace) 

ScannerResource.java:
-
Response update(final ScannerModel model, final boolean replace,
  final UriInfo uriInfo) {
{code}

 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

2011-12-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174518#comment-13174518
 ] 

stack commented on HBASE-4616:
--

Regards UUID, if its an md5 anyways, lets call it what it is.
I tried to see if a hash that was smaller in size but it seems
that even murmur hash v3 is 120 bits now. If md5, can get hash
by doing 'md5sum tablename' on command line if have to; is
there a uuid command-line tool so readily available?

I was thinking too we could get base64 of the md5 so size in meta is less
but thats probably more pain that its worth.

Or, what would it take to keep a file of tablenames to int codes in
the filesystem and use the zero-padded int code in .META. instead?
All tables under value 10 would be catalog tables... .META. would
be 0, etc.  Could sit beside the hbase.version file at head of the
filesystem; could call the file .hbase.tablenames.map or some such.
UUID/md5 is nice in that it makes it so don't need this but they
are wide compared to a short.  Maybe UUID/md5 the way to go -- less
complexity and the width is not really concern since we don't write
tablename into keys (unless we are replicating I suppose; hmm... maybe
UUID/md5 is safer because two clusters could have same code for different
table names -- that'd get really confusing... if you stay w/ md5s, you
have the same behavior we have now regards cross-dc and table names).

I think we need to make a list of pros/cons removing tablename from meta still.

In src/main/java/org/apache/hadoop/hbase/HRegionInfo.java why we import
Using MetaReader inside in HRI is not on I'd say.  Thats a base hbase
class using a subpackage class that does client operations... HRI should
be usable w/o need of a cluster/network.  Ditto CatalogTracker.

Dont do this.. Expand it: +import org.apache.hadoop.hbase.util.*;

How is this new format going t play w/ old format?  We have to convert
all the old formats as part of migration into new format?  What about
the fact that the dir in the filesystem that holds region data is named
for a hash of the HRI name?  We can't rename all dirs in fs as part of
migration?  Or you think we should?

You could add some javadoc here:

* New region name format:
-   *lt;tablename,,lt;startkey,lt;regionIdTimestamp.lt;encodedName.
+   *lt;tablename,,lt;endkey,lt;regionIdTimestamp.lt;encodedName.
* where,
*lt;encodedName is a hex version of the MD5 hash of
-   *lt;tablename,lt;startkey,lt;regionIdTimestamp
+   *lt;tablename,lt;endkey,lt;regionIdTimestamp

Isn't tablename the md5 of tablename now?

What is this comment about?

+  // It should say, the tablename encoded in the region ends with 0x01,
+  // but the last region's tablename ends with 0x02
+  public static final int END_OF_TABLE_NAME = 1;
+  public static final int END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY = 
END_OF_TABLE_NAME + 1;

Do you think these should be printable characters?  Else, they'll show in 
terminal as a box or something?  For example, if this end-of-tablename marker 
were a '.', and the last table had a ':', that'd sort properly and it wouldn't 
look so bad when you listed regions (it would be confusing having '.' for every 
region name but the last but less confusing that two unprintables that rendor 
one way for 1 and another for 2).  Or you could have comma as end char and a 
period for the last region in a table (that makes a bit more 'sense')

Lines like this are too long:


+this.regionName = createRegionName(this.tableName, startKey, endKey, 
Long.toString(regionId).getBytes(), true);

Fix your formatting in places like below:

+ boolean newFormat){
+  // We need to be able to add a single byte easily to the regionName as 
we are building it up
+  // It's being used by appending delimiters and special markers.
+
+byte [] oneByte = new byte[1];
+//This cannot return any weird string chars as all uuid chars are a hex 
char.

The first comment is too indented.  There is a space between close paren and 
open of curly brace on function, etc.

Use Bytes.toBytes instead of below if only for fact that its shorter statement:

+byte[] uuidTableName = 
UUID.nameUUIDFromBytes(tableName).toString().getBytes();

If null, throw exception?  Don't keep going?

+int allocation = uuidTableName ==  null ? 2 : uuidTableName.length + 2;$

Explain why you are adding '2' in a comment?

Why write +allocation += endKey == null ? 1 : endKey.length + 1; so?

Why not as if (endKey == null) allocation += endKey.length;
and then allocation += 1; // COMMENT EXPLAINING WHY THE +1?

Yeah, if some of these items are null like id or tablename, it should be error?

This don't seem necessary, the oneByte that is:

+  oneByte[0] = END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY;$

Just put the END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY into the  
byteArrayDataOutput.put(oneByte);

Ditto 

[jira] [Created] (HBASE-5086) Reopening a region on the a RS can leave it in PENDING_OPEN

2011-12-21 Thread Jean-Daniel Cryans (Created) (JIRA)
Reopening a region on the a RS can leave it in PENDING_OPEN
---

 Key: HBASE-5086
 URL: https://issues.apache.org/jira/browse/HBASE-5086
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.1


I got this twice during the same test.

If the region servers are slow enough and you run an online alter, it's 
possible for the RS to change the znode status to CLOSED and have the master 
send an OPEN before the region server is able to remove the region from it's 
list of RITs.

This is what the master sees:

{quote}
011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. (offlining)
2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:62003-0x134589d3db033f7 Creating unassigned node for 
43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to sv4r25s44,62023,1324494325099 for region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, 
region=43123e2e3fc83ec25fe2a76b4f09077f
2011-12-21 22:24:15,656 DEBUG 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
event for 43123e2e3fc83ec25fe2a76b4f09077f
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for 
43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Found an existing plan for 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination 
server is + sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; 
plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., src=, 
dest=sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
to sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,664 ERROR org.apache.hadoop.hbase.master.AssignmentManager: 
Failed assignment in: sv4r25s44,62023,1324494325099 due to 
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: 
Received:OPEN for the 
region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which we 
are already trying to CLOSE.
{quote}

After that the master abandons.

And the region server:
{quote}
2011-12-21 22:24:09,523 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,523 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: 
disabling compactions  flushes
2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Running close preflush of 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memstore flush for 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current region 
memstore size 40.5m
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Finished snapshotting 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing wait 
for mvcc, flushsize=42482936
2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
Renaming flushed file at 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
 to 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72
2011-12-21 22:24:13,568 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72,
 entries=54209, sequenceid=31451012, memsize=40.5m, filesize=31.4m
2011-12-21 22:24:14,381 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of 

[jira] [Updated] (HBASE-5086) Reopening a region on a RS can leave it in PENDING_OPEN

2011-12-21 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5086:
--

Summary: Reopening a region on a RS can leave it in PENDING_OPEN  (was: 
Reopening a region on the a RS can leave it in PENDING_OPEN)

 Reopening a region on a RS can leave it in PENDING_OPEN
 ---

 Key: HBASE-5086
 URL: https://issues.apache.org/jira/browse/HBASE-5086
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.1


 I got this twice during the same test.
 If the region servers are slow enough and you run an online alter, it's 
 possible for the RS to change the znode status to CLOSED and have the master 
 send an OPEN before the region server is able to remove the region from it's 
 list of RITs.
 This is what the master sees:
 {quote}
 011-12-21 22:24:09,498 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
 (offlining)
 2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db033f7 Creating unassigned node for 
 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
 2011-12-21 22:24:09,524 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r25s44,62023,1324494325099 for region 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:15,656 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, 
 region=43123e2e3fc83ec25fe2a76b4f09077f
 2011-12-21 22:24:15,656 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 43123e2e3fc83ec25fe2a76b4f09077f
 2011-12-21 22:24:15,656 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
 state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for 
 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
 2011-12-21 22:24:15,663 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination 
 server is + sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,663 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; 
 plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., 
 src=, dest=sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,663 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to 
 sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,664 ERROR 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in: 
 sv4r25s44,62023,1324494325099 due to 
 org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: 
 Received:OPEN for the 
 region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which 
 we are already trying to CLOSE.
 {quote}
 After that the master abandons.
 And the region server:
 {quote}
 2011-12-21 22:24:09,523 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:09,523 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
 close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: 
 disabling compactions  flushes
 2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Running close preflush of 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current 
 region memstore size 40.5m
 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished snapshotting 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing 
 wait for mvcc, flushsize=42482936
 2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Renaming flushed file at 
 hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
  to 
 

[jira] [Updated] (HBASE-4940) hadoop-metrics.properties can include configuration of the rest context for ganglia

2011-12-21 Thread Mubarak Seyed (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubarak Seyed updated HBASE-4940:
-

Attachment: HBASE-4940.trunk.v1.patch

The attached file (HBASE-4940.trunk.v1.patch)
is a patch for TRUNK. Thanks.

 hadoop-metrics.properties can include configuration of the rest context for 
 ganglia
 -

 Key: HBASE-4940
 URL: https://issues.apache.org/jira/browse/HBASE-4940
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.90.5
 Environment: HBase-0.90.1
Reporter: Mubarak Seyed
Priority: Minor
  Labels: hbase-rest
 Fix For: 0.94.0

 Attachments: HBASE-4940.patch, HBASE-4940.trunk.v1.patch


 It appears from hadoop-metrics.properties that configuration for rest context 
 is missing. It would be good if we add the rest context and commented out 
 them, if anyone is using rest-server and if they want to monitor using 
 ganglia context then they can uncomment the rest context and use them for 
 rest-server monitoring using ganglia.
 {code}
 # Configuration of the rest context for ganglia
 #rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext
 #rest.period=10
 #rest.servers=ganglia-metad-hostname:port
 {code}
 Working on the patch, will submit it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174532#comment-13174532
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--



bq.  On 2011-12-21 17:05:47, Michael Stack wrote:
bq.   Looks good to me.  How I know it works?  Would it be hard to write a 
test?
bq.  
bq.  Jimmy Xiang wrote:
bq.  This is a race issue hard to reproduce. Let me think about how to come 
up a unit test.

Jimmy.  Please paste your last patch to JIRA.  I'd like to include in our 
0.92RC though it doesn't have a test.


- Michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4045
---


On 2011-12-22 00:31:23, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-22 00:31:23)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
667a8b1 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
32ad7e8 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174533#comment-13174533
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--



bq.  On 2011-12-21 17:05:47, Michael Stack wrote:
bq.   Looks good to me.  How I know it works?  Would it be hard to write a 
test?
bq.  
bq.  Jimmy Xiang wrote:
bq.  This is a race issue hard to reproduce. Let me think about how to come 
up a unit test.
bq.  
bq.  Michael Stack wrote:
bq.  Jimmy.  Please paste your last patch to JIRA.  I'd like to include in 
our 0.92RC though it doesn't have a test.

Yes, it is in Jira: hbase-5081-patch-v7.txt
Thanks.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4045
---


On 2011-12-22 00:31:23, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-22 00:31:23)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
667a8b1 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
32ad7e8 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174534#comment-13174534
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12508328/hbase-5081-patch-v7.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -152 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/575//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/575//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/575//console

This message is automatically generated.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5087) Up the 0.92RC zk to 3.4.1RC0

2011-12-21 Thread stack (Created) (JIRA)
Up the 0.92RC zk to 3.4.1RC0


 Key: HBASE-5087
 URL: https://issues.apache.org/jira/browse/HBASE-5087
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0


ZK just found bad bug in 3.4.1 zookeeper-1333.  They put up a fix and new rc, 
3.4.1

(Andrew, you saw Todds query asking if it'd possible to hold to zk 3.3.4 and 
just have 3.4.1 for secure installs?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174539#comment-13174539
 ] 

jirapos...@reviews.apache.org commented on HBASE-5081:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3292/#review4064
---



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9163

Did the first approach not work?



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/3292/#comment9164

It seems we can do this unconditionally now (no need for the 
safeToDeleteNodeAsync flag). The worst scenario is trying to remove a node that 
has already been removed


- Lars


On 2011-12-22 00:31:23, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3292/
bq.  ---
bq.  
bq.  (Updated 2011-12-22 00:31:23)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Lars Hofhansl.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  In this patch, after a task is done, we don't delete the node if the task 
is failed.  So that when it's retried later on, there won't be race problem.
bq.  
bq.  It used to delete the node always.
bq.  
bq.  
bq.  This addresses bug HBASE-5081.
bq.  https://issues.apache.org/jira/browse/HBASE-5081
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
667a8b1 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
32ad7e8 
bq.  
bq.  Diff: https://reviews.apache.org/r/3292/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -Dtest=TestDistributedLogSplitting clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174541#comment-13174541
 ] 

Phabricator commented on HBASE-5033:


Kannan has commented on the revision 
[jira][HBASE-5033][[89-fb]]Opening/Closing store in parallel to reduce region 
open/close time.

  Liyin-- This looks very good. Some minor comments inlined.

  Also: Initially, I was not sure if this parallelization is worth doing for 
close. But as you pointed out-- with EvictOnClose turned on, this 
parallelization might be useful for close scenario too.

  On a separate note, with EvictOnClose, it seems like there are two cases 
related to closing a region.

  a) cluster shutdown related region closes
  b) load balancing related region closes.

  For the cluster shutdown case, even if EvictOnClose is true, I think we 
should by-pass the evictions to avoid the cost of cleaning up the blocks 
individually from the block cache since the server is shutting down anyway. For 
the load balancing case or normal compaction related file deletion case, it 
makes sense to keep do the evictions on close. I don't think the code currently 
distinguishes the two cases... (but I haven't looked carefully to double check).

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:529-541 might 
be good to make this block a helper function. This block of code is repeated 4 
times with minor differences in params...
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:528 since 30 
is repeated a few times, would be better to make a const for this too. And I 
would set the default much lower... to say 8 or something. Not sure how many 
disks is typical.
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java:294 Openner - 
Opener
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:527 plural 
form here (maxStoreOpenerThread - maxStoreOpenerThreads) and similarly in the 
other places.


REVISION DETAIL
  https://reviews.facebook.net/D933


 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5087) Up the 0.92RC zk to 3.4.1RC0

2011-12-21 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5087.
--

  Resolution: Fixed
Assignee: stack
Hadoop Flags: Incompatible change

Committed as incompatible change to 0.92 and trunk (though 3.4.1 runs against 
3.3.x... marking it so to give people pause)

 Up the 0.92RC zk to 3.4.1RC0
 

 Key: HBASE-5087
 URL: https://issues.apache.org/jira/browse/HBASE-5087
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 5087.txt


 ZK just found bad bug in 3.4.1 zookeeper-1333.  They put up a fix and new rc, 
 3.4.1
 (Andrew, you saw Todds query asking if it'd possible to hold to zk 3.3.4 and 
 just have 3.4.1 for secure installs?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >