[jira] [Commented] (HBASE-3924) Improve Shell's CLI help

2011-12-29 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177087#comment-13177087
 ] 

Harsh J commented on HBASE-3924:


Lars,

Sorry for the late get back. Thanks for committing your changes in! :)

 Improve Shell's CLI help
 

 Key: HBASE-3924
 URL: https://issues.apache.org/jira/browse/HBASE-3924
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 3924.txt, HBASE-3924.patch


 In the hirb.rb source we have
 {noformat}
 # so they don't go through to irb.  Output shell 'usage' if user types 
 '--help'
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  formatFormatter for outputting results: console | html.
 Default: console
  -d | --debug  Set DEBUG log levels.
 HERE
 found = []
 format = 'console'
 script2run = nil
 log_level = org.apache.log4j.Level::ERROR
 for arg in ARGV
  if arg =~ /^--format=(.+)/i
format = $1
if format =~ /^html$/i
  raise NoMethodError.new(Not yet implemented)
elsif format =~ /^console$/i
  # This is default
else
  raise ArgumentError.new(Unsupported format  + arg)
end
found.push(arg)
  elsif arg == '-h' || arg == '--help'
puts cmdline_help
exit
  elsif arg == '-d' || arg == '--debug'
log_level = org.apache.log4j.Level::DEBUG
$fullBackTrace = true
puts Setting DEBUG log level...
  else
# Presume it a script. Save it off for running later below
# after we've set up some environment.
script2run = arg
found.push(arg)
# Presume that any other args are meant for the script.
break
  end
 end
 {noformat}
 We should enhance the help printed when using -h/--help to look like this?
 {noformat}
 cmdline_help = HERE # HERE document output as shell usage
 HBase Shell command-line options:
  --format={console|html}Formatter for outputting results.
 Default: console
  -d | --debug  Set DEBUG log levels.
  -h | --help   This help.
  script-filename [script-options]
 HERE
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177282#comment-13177282
 ] 

Zhihong Yu commented on HBASE-5064:
---

I saw patch v19 coming and started to run test suited with it - I didn't keep 
the output from TestMiniClusterLoadParallel

w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) 
for surefire.secondPartThreadCount if the parameter isn't set.


 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v19.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177331#comment-13177331
 ] 

Zhihong Yu commented on HBASE-5103:
---

Unit tests are passing.

Integrated to 0.92 and TRUNK.

Thanks for the patch Jonathan.

Thanks for the review Lars.

 Fix improper master znode deserialization
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177333#comment-13177333
 ] 

Hadoop QA commented on HBASE-5100:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508859/5100.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/627//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/627//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/627//console

This message is automatically generated.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 

[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177327#comment-13177327
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/#review4148
---


The code is much closer to what was in my mind.
The new test needs improvement.


src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/3323/#comment9361

Line is 88 chars long.
If ExecutorService is imported, the line should be much shorter.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/3323/#comment9364

I suggest recording the amount of time taken by the recovery and log it 
after executor.awaitTermination() returns.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/3323/#comment9362

Timeout value should be included in the exception message.



src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java
https://reviews.apache.org/r/3323/#comment9365

Year is not needed.



src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java
https://reviews.apache.org/r/3323/#comment9363

This shouldn't be small test since mini cluster is involved.



src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java
https://reviews.apache.org/r/3323/#comment9366

Please add short javadoc for this new class.



src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java
https://reviews.apache.org/r/3323/#comment9368

We shouldn't pass 1 here since that means 1 master.



src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java
https://reviews.apache.org/r/3323/#comment9367

Since the test doesn't involve standby master, I think we should use a 
different name.


- Ted


On 2011-12-29 18:38:54, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3323/
bq.  ---
bq.  
bq.  (Updated 2011-12-29 18:38:54)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.
bq.  
bq.  I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.
bq.  
bq.  
bq.  This addresses bug HBASE-5099.
bq.  https://issues.apache.org/jira/browse/HBASE-5099
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3323/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -PlocalTests -Dtest=TestMaster* clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one 

[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177339#comment-13177339
 ] 

Zhihong Yu commented on HBASE-5100:
---

TestMasterObserver passed locally on my MacBook.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 

[jira] [Updated] (HBASE-5103) Fix improper master znode deserialization

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5103:
--

Fix Version/s: 0.94.0
   0.92.0
 Hadoop Flags: Reviewed
  Summary: Fix improper master znode deserialization  (was: Fix places 
where master znode is improperly deserialized.)

 Fix improper master znode deserialization
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5010) Filter HFiles based on TTL

2011-12-29 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177335#comment-13177335
 ] 

Phabricator commented on HBASE-5010:


mbautin has commented on the revision [jira] [HBASE-5010] [89-fb] Filter 
HFiles based on TTL.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:209 Yes, 
that's correct.
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:224 Yes, 
this is the way it was in the original fix (89-fb version).

REVISION DETAIL
  https://reviews.facebook.net/D909


 Filter HFiles based on TTL
 --

 Key: HBASE-5010
 URL: https://issues.apache.org/jira/browse/HBASE-5010
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Zhihong Yu
 Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, 
 D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch


 In ScanWildcardColumnTracker we have
 {code:java}
  
   this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;
   ...
   private boolean isExpired(long timestamp) {
 return timestamp  oldestStamp;
   }
 {code}
 but this time range filtering does not participate in HFile selection. In one 
 real case this caused next() calls to time out because all KVs in a table got 
 expired, but next() had to iterate over the whole table to find that out. We 
 should be able to filter out those HFiles right away. I think a reasonable 
 approach is to add a default timerange filter to every scan for a CF with a 
 finite TTL and utilize existing filtering in 
 StoreFile.Reader.passesTimerangeFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177282#comment-13177282
 ] 

Zhihong Yu edited comment on HBASE-5064 at 12/29/11 5:20 PM:
-

I saw patch v19 coming and started to run test suite with it - I didn't keep 
the output from TestMiniClusterLoadParallel

w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) 
for surefire.secondPartThreadCount if the parameter isn't set.


  was (Author: zhi...@ebaysf.com):
I saw patch v19 coming and started to run test suited with it - I didn't 
keep the output from TestMiniClusterLoadParallel

w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) 
for surefire.secondPartThreadCount if the parameter isn't set.

  
 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v18.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5099:
---

Attachment: hbase-5099-v3.patch

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177274#comment-13177274
 ] 

nkeywal commented on HBASE-5064:


@ted: The 3 last tests seems ok on hadoop-qa, even if I'am sure that we will 
discover some issues. A 4th test is in progress. I will do a 5th also. Anything 
meaningful in the logs for TestMiniClusterLoadParallel?

Note that the v18 patch is with 4 tests in //, I believe we will need to commit 
it with 2 or 3 to be sure that a 1 year old laptop can run the tests smoothly. 
I use 4 here to maximize // and minimize test time. We can set a parameter to 
the hadoop-qa build to use 4 or more if we want to...

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5103) Fix places where master znode is improperly deserialized.

2011-12-29 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5103:
--

Status: Patch Available  (was: Open)

 Fix places where master znode is improperly deserialized.
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v19.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5100:
--

Attachment: 5100.txt

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  destination server is + 

[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5100:
--

Attachment: (was: 5100.txt)

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  destination server is + 

[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177260#comment-13177260
 ] 

Zhihong Yu commented on HBASE-5064:
---

Trying out patch v18 on Linux I got:
{code}
Tests in error:
  loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test 
timed out after 12 milliseconds
{code}
open files limit is 32768 and max user processes are unlimited.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177210#comment-13177210
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508842/5064.v18.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/623//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/623//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/623//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 
 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5100:
--

Attachment: 5100.txt

 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780, 
 load=(requests=0, regions=1, 

[jira] [Commented] (HBASE-5103) Fix places where master znode is improperly deserialized.

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177277#comment-13177277
 ] 

Zhihong Yu commented on HBASE-5103:
---

+1 on patch.
Nice catch.

 Fix places where master znode is improperly deserialized.
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177342#comment-13177342
 ] 

Zhihong Yu commented on HBASE-5099:
---

bq. We shouldn't pass 1 here since that means 1 master.
The new test is different from TestMasterFailover so I guess 1 could be fine 
here.

How about naming the new test TestZKSessionRecoveryInMaster ?

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5100:
--

Fix Version/s: 0.94.0
   0.92.0
  Summary: Rollback of split could cause closed region to be opened 
again  (was: Rollback of split would cause closed region to opened )

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 

[jira] [Commented] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177268#comment-13177268
 ] 

Zhihong Yu commented on HBASE-5100:
---

I thought I like Java until I understood what createDaughters() should be doing 
- with Chunhui's patch, i.e.

The case is that we need to handle two types of exceptions: the one coming out 
of parent.close(false) call and the IOE thrown from within the try block.

I got some idea when I was driving this morning. 5100.txt is my proposed form 
where I tried to disambiguate the two types of exceptions by removing finally 
block.
closedByOtherException is a singleton so the overhead of my proposal vs. 
introducing a boolean is negligible.

I ran split-related tests and they passed:
{code}
  806  mt -Dtest=TestCoprocessorInterface
  807  mt -Dtest=TestSplitTransaction
  808  mt -Dtest=TestSplitTransactionOnCluster
  809  mt -Dtest=TestEndToEndSplitTransaction
  810  mt -Dtest=TestHRegion
{code}
Please share your comments.

 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 

[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4218:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508818/D447.15.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 68 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/621//console

This message is automatically generated.)

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, 
 D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.2.patch, 
 D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, 
 D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5103) Fix a places where master znode is improperly deserialized.

2011-12-29 Thread Jonathan Hsieh (Created) (JIRA)
Fix a places where master znode is improperly deserialized.
---

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh



In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
created as a versioned serialized version of ServerName
{code}
 if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
  this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
{code}

There are a few user visible places where it is used but not deserialized 
properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4934) Display Master server and Regionserver start time on respective info servers.

2011-12-29 Thread Ming Ma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177343#comment-13177343
 ] 

Ming Ma commented on HBASE-4934:


Jonathan, there is a special case when expired ZK session is recovered by the 
master. masterActiveTime is set in finishInitialization which won't be invoked 
in that scenario. Perhaps moving the setting of masterActiveTime to 
becomeActiveMaster method will solve the problem?

 Display Master server and Regionserver start time on respective info servers.
 -

 Key: HBASE-4934
 URL: https://issues.apache.org/jira/browse/HBASE-4934
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0

 Attachments: hbase-4934.patch, hmaster.png, hregion.png


 With operations like rolling restart or master failovers, it is difficult to 
 tell if a server is the old instance or the new restarted instance.  
 Adding a start date stamp on the info web pages would be helpful for 
 determining this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v17.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5103) Fix places where master znode is improperly deserialized.

2011-12-29 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177308#comment-13177308
 ] 

Lars Hofhansl commented on HBASE-5103:
--

+1

 Fix places where master znode is improperly deserialized.
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit

2011-12-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177206#comment-13177206
 ] 

Jonathan Hsieh commented on HBASE-5101:
---

@Ted
First cut -- simply some configured number of max regions.  Maybe it would stop 
splitting at a point where it can handle if some number of rs's going down and 
its regions are reassigned.

There could possibly be a limit on tables as well -- and we could put cap on 
the # of region servers/table and # of regions/region server.  

 Add a max number of regions per regionserver limit
 --

 Key: HBASE-5101
 URL: https://issues.apache.org/jira/browse/HBASE-5101
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh

 In a testing environment, a cluster got to a state with more than 1500 
 regions per region server, and essentially became stuck and unavailable.  We 
 could add a limit to the number of regions that a region server can serve to 
 prevent this from happening.  This looks like it could be implemented in the 
 core or as a coprocessor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177278#comment-13177278
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508854/5064.v19.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/625//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/625//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/625//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 
 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 
 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5103) Fix places where master znode is improperly deserialized.

2011-12-29 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5103:
--

Attachment: hbase-5103.patch

Applies on trunk and 0.92.

 Fix places where master znode is improperly deserialized.
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177287#comment-13177287
 ] 

Zhihong Yu commented on HBASE-5064:
---

TestMiniClusterLoadParallel failed again for patch v19.
From its output:
{code}
2011-12-29 09:24:49,249 WARN  [HBaseReaderThread_7] 
zookeeper.RecoverableZooKeeper(159): Possibly transient ZooKeeper exception: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase
2011-12-29 09:24:49,249 INFO  [HBaseReaderThread_7] util.RetryCounter(53): The 
1 times to retry  after sleeping 2000 ms
...
{code}

There're 3 failures so far:
{code}
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 35, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 885.494 sec 
 FAILURE!
--
Running org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 192.578 sec  
FAILURE!
--
Running org.apache.hadoop.hbase.regionserver.TestRpcMetrics
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.106 sec  
FAILURE!
{code}

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177185#comment-13177185
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508839/5064.v17.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.hfile.TestLruBlockCache

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/622//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/622//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/622//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5083) Backup HMaster should have http infoport open with link to the active master

2011-12-29 Thread Jonathan Hsieh (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-5083:
-

Assignee: Jonathan Hsieh

 Backup HMaster should have http infoport open with link to the active master
 

 Key: HBASE-5083
 URL: https://issues.apache.org/jira/browse/HBASE-5083
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 Without ssh'ing and jps/ps'ing, it is difficult to see if a backup hmaster is 
 up.  It seems like it would be good for a backup hmaster to have a basic web 
 page up on the info port so that users could see that it is up.  Also it 
 should probably either provide a link to the active master or automatically 
 forward to the active master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v19.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 
 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Attachment: 5064.v18.patch

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 
 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5099:
---

Status: Patch Available  (was: Open)

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 
 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177351#comment-13177351
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--



bq.  On 2011-12-29 19:02:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1475
bq.   https://reviews.apache.org/r/3323/diff/3/?file=65929#file65929line1475
bq.  
bq.   Timeout value should be included in the exception message.

It may not because of timeout.


bq.  On 2011-12-29 19:02:17, Ted Yu wrote:
bq.   src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, 
line 2
bq.   https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line2
bq.  
bq.   Year is not needed.

I copied it from another test case.  I can remove it.


bq.  On 2011-12-29 19:02:17, Ted Yu wrote:
bq.   src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, 
line 34
bq.   https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line34
bq.  
bq.   This shouldn't be small test since mini cluster is involved.

Ok.


bq.  On 2011-12-29 19:02:17, Ted Yu wrote:
bq.   src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, 
line 41
bq.   https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line41
bq.  
bq.   We shouldn't pass 1 here since that means 1 master.

It means 1 region server.  Actually, I do want 1 master so it can recover 
itself.  Otherwise, the backup master will take over and the active master 
doesn't have a chance to recover in this scenario.


bq.  On 2011-12-29 19:02:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1452
bq.   https://reviews.apache.org/r/3323/diff/3/?file=65929#file65929line1452
bq.  
bq.   Line is 88 chars long.
bq.   If ExecutorService is imported, the line should be much shorter.

We can't do this since there is already an ExecutorService (from hbase) 
imported.  I can't use the hbase ExecutorService because it doesn't fit.


bq.  On 2011-12-29 19:02:17, Ted Yu wrote:
bq.   src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, 
line 50
bq.   https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line50
bq.  
bq.   Since the test doesn't involve standby master, I think we should use 
a different name.

There is test case called TestMasterFailover which involves standby master.  
That's why I called it TestMasterRecovery.  How about 
TestMasterZKSessionRecovery?


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/#review4148
---


On 2011-12-29 18:38:54, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3323/
bq.  ---
bq.  
bq.  (Updated 2011-12-29 18:38:54)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.
bq.  
bq.  I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.
bq.  
bq.  
bq.  This addresses bug HBASE-5099.
bq.  https://issues.apache.org/jira/browse/HBASE-5099
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3323/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -PlocalTests -Dtest=TestMaster* clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  

[jira] [Commented] (HBASE-5055) Build against hadoop 0.22 broken

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177349#comment-13177349
 ] 

Zhihong Yu commented on HBASE-5055:
---

Good catch Ming.

Addendum integrated to 0.92 branch.

 Build against hadoop 0.22 broken
 

 Key: HBASE-5055
 URL: https://issues.apache.org/jira/browse/HBASE-5055
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Zhihong Yu
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0, 0.94.0

 Attachments: 5055.txt, HBASE-5055-0.92.patch


 I got the following when compiling TRUNK against hadoop 0.22:
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile 
 (default-compile) on project hbase: Compilation failure: Compilation failure:
 [ERROR] 
 /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[37,39]
  cannot find symbol
 [ERROR] symbol  : class DFSInputStream
 [ERROR] location: class org.apache.hadoop.hdfs.DFSClient
 [ERROR] 
 [ERROR] 
 /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[109,37]
  cannot find symbol
 [ERROR] symbol  : class DFSInputStream
 [ERROR] location: class 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.WALReader.WALReaderFSDataInputStream
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT

2011-12-29 Thread Kannan Muthukkaruppan (Created) (JIRA)
FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
use the SEEK_NEXT_USING_HINT
-

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya


Thanks Jiakai Liu for reporting this issue and doing the initial investigation. 
Email from Jiakai below:

Assuming that we have an index column family with the following entries:
tag0:001:thread1
...
tag1:001:thread1
tag1:002:thread2
...
tag1:010:thread10
...
tag2:001:thread1
tag2:005:thread5
...

To get threads with tag1 in range [5, 10), I tried the following code:

ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1));
ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 
5 /* offset */);

FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
filters.addFilter(filter1);
filters.addFilter(filter2);

Get get = new Get(USER);
get.addFamily(COLUMN_FAMILY);
get.setMaxVersions(1);
get.setFilter(filters);

Somehow it didn't work as expected. It returned the entries as if the filter1 
were not set.

Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
The FilterList filter does not handle this return code properly (treat it as 
INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Patch Available  (was: Open)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit

2011-12-29 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177352#comment-13177352
 ] 

Todd Lipcon commented on HBASE-5101:


Isnt this hbase.regionserver.regionSplitLimit (HBASE-2844)?

 Add a max number of regions per regionserver limit
 --

 Key: HBASE-5101
 URL: https://issues.apache.org/jira/browse/HBASE-5101
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh

 In a testing environment, a cluster got to a state with more than 1500 
 regions per region server, and essentially became stuck and unavailable.  We 
 could add a limit to the number of regions that a region server can serve to 
 prevent this from happening.  This looks like it could be implemented in the 
 core or as a coprocessor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177311#comment-13177311
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/
---

(Updated 2011-12-29 18:38:54.473524)


Review request for hbase, Ted Yu and Michael Stack.


Changes
---

Updated per Ted's comment.  Also separated the test class and added a possible 
testcase.


Summary
---

Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.

I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.


This addresses bug HBASE-5099.
https://issues.apache.org/jira/browse/HBASE-5099


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
  src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/3323/diff


Testing
---

mvn -PlocalTests -Dtest=TestMaster* clean test


Thanks,

Jimmy



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177224#comment-13177224
 ] 

Zhihong Yu commented on HBASE-5101:
---

@Jon:
I think HBASE-4365 tried to tackle the same problem in a different way.
Meaning, a new RegionSplitPolicy implementation should be created.
Shall we continue along that route ?

 Add a max number of regions per regionserver limit
 --

 Key: HBASE-5101
 URL: https://issues.apache.org/jira/browse/HBASE-5101
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh

 In a testing environment, a cluster got to a state with more than 1500 
 regions per region server, and essentially became stuck and unavailable.  We 
 could add a limit to the number of regions that a region server can serve to 
 prevent this from happening.  This looks like it could be implemented in the 
 core or as a coprocessor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5099:
---

Status: Open  (was: Patch Available)

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177228#comment-13177228
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508844/5064.v18.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/624//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/624//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/624//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5103) Fix places where master znode is improperly deserialized.

2011-12-29 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5103:
--

Description: 
In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
created as a versioned serialized version of ServerName
{code}
 if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
  this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
{code}

There are a few user visible places where it is used but not deserialized 
properly.

  was:

In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
created as a versioned serialized version of ServerName
{code}
 if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
  this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
{code}

There are a few user visible places where it is used but not deserialized 
properly.

   Priority: Minor  (was: Major)
Summary: Fix places where master znode is improperly deserialized.  
(was: Fix a places where master znode is improperly deserialized.)

 Fix places where master znode is improperly deserialized.
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor

 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5103) Fix places where master znode is improperly deserialized.

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177312#comment-13177312
 ] 

Hadoop QA commented on HBASE-5103:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508858/hbase-5103.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/626//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/626//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/626//console

This message is automatically generated.

 Fix places where master znode is improperly deserialized.
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5064:
---

Status: Open  (was: Patch Available)

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 
 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177285#comment-13177285
 ] 

nkeywal commented on HBASE-5064:


You want 2 process in // only for hadoop-qa? It seems that the machine never 
runs two builds in //, so we should maximize the number of process. That makes 
the build last ~35 minutes instead of 2 hours.


 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 
 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5010) Filter HFiles based on TTL

2011-12-29 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5010:
---

Attachment: D909.6.patch

mbautin updated the revision [jira] [HBASE-5010] [89-fb] Filter HFiles based 
on TTL.
Reviewers: Kannan, Liyin, JIRA

  Addressing the issue that Kannan pointed out: getScanner does not use its 
arguments. It turned out that getScanner was called with this.scan and 
this.columns in all callsites, so I have removed those arguments.

REVISION DETAIL
  https://reviews.facebook.net/D909

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
  src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java
  src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java
  
src/test/java/org/apache/hadoop/hbase/io/hfile/TestScannerSelectionUsingTTL.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java


 Filter HFiles based on TTL
 --

 Key: HBASE-5010
 URL: https://issues.apache.org/jira/browse/HBASE-5010
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Zhihong Yu
 Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, 
 D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch


 In ScanWildcardColumnTracker we have
 {code:java}
  
   this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;
   ...
   private boolean isExpired(long timestamp) {
 return timestamp  oldestStamp;
   }
 {code}
 but this time range filtering does not participate in HFile selection. In one 
 real case this caused next() calls to time out because all KVs in a table got 
 expired, but next() had to iterate over the whole table to find that out. We 
 should be able to filter out those HFiles right away. I think a reasonable 
 approach is to add a default timerange filter to every scan for a CF with a 
 finite TTL and utilize existing filtering in 
 StoreFile.Reader.passesTimerangeFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177359#comment-13177359
 ] 

Hadoop QA commented on HBASE-5064:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508863/5064.v19.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFromClientSide
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/629//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/629//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/629//console

This message is automatically generated.

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 
 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT

2011-12-29 Thread Kannan Muthukkaruppan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-5104:
-

Attachment: testFilterList.rb

A test case illustrating the problem attached.

 FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
 use the SEEK_NEXT_USING_HINT
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: testFilterList.rb


 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5010) Filter HFiles based on TTL

2011-12-29 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177361#comment-13177361
 ] 

Phabricator commented on HBASE-5010:


Kannan has accepted the revision [jira] [HBASE-5010] [89-fb] Filter HFiles 
based on TTL.

  awesome optimization and nice work!

REVISION DETAIL
  https://reviews.facebook.net/D909


 Filter HFiles based on TTL
 --

 Key: HBASE-5010
 URL: https://issues.apache.org/jira/browse/HBASE-5010
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Zhihong Yu
 Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, 
 D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch


 In ScanWildcardColumnTracker we have
 {code:java}
  
   this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;
   ...
   private boolean isExpired(long timestamp) {
 return timestamp  oldestStamp;
   }
 {code}
 but this time range filtering does not participate in HFile selection. In one 
 real case this caused next() calls to time out because all KVs in a table got 
 expired, but next() had to iterate over the whole table to find that out. We 
 should be able to filter out those HFiles right away. I think a reasonable 
 approach is to add a default timerange filter to every scan for a CF with a 
 finite TTL and utilize existing filtering in 
 StoreFile.Reader.passesTimerangeFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177363#comment-13177363
 ] 

Zhihong Yu commented on HBASE-5099:
---

TestMasterZKSessionRecovery is a good name for the new test.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5105) TestImportTsv failed with hadoop 0.22

2011-12-29 Thread Ming Ma (Created) (JIRA)
TestImportTsv failed with hadoop 0.22
-

 Key: HBASE-5105
 URL: https://issues.apache.org/jira/browse/HBASE-5105
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0
Reporter: Ming Ma


java.io.FileNotFoundException: File does not exist: 
/home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
at 
org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331)
at 
org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350)
at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
at 
org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215)
at 
org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5105) TestImportTsv failed with hadoop 0.22

2011-12-29 Thread Ming Ma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HBASE-5105:
---

Attachment: HBASE-5105-0.92.patch

Fix the test code to create miniMapReduceCluster and pass the right 
configuration to Job object.

 TestImportTsv failed with hadoop 0.22
 -

 Key: HBASE-5105
 URL: https://issues.apache.org/jira/browse/HBASE-5105
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0
Reporter: Ming Ma
 Attachments: HBASE-5105-0.92.patch


 java.io.FileNotFoundException: File does not exist: 
 /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5105) TestImportTsv failed with hadoop 0.22

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5105:
--

Status: Patch Available  (was: Open)

 TestImportTsv failed with hadoop 0.22
 -

 Key: HBASE-5105
 URL: https://issues.apache.org/jira/browse/HBASE-5105
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0
Reporter: Ming Ma
 Attachments: HBASE-5105-0.92.patch


 java.io.FileNotFoundException: File does not exist: 
 /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177369#comment-13177369
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--



bq.  On 2011-12-28 22:52:56, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1463
bq.   https://reviews.apache.org/r/3323/diff/2/?file=65902#file65902line1463
bq.  
bq.   The call() at line 1429 would only throw 3 types of exceptions: IE, 
IOE and KE.
bq.   
bq.   So the cause should be one the three types.
bq.   I suggest using instanceof to check against each of them and if a 
match is found, cast as that exception type and rethrow.
bq.   
bq.   If there is no match, I suggest using this ctor to throw a new 
IOException:
bq.   IOException(String message, Throwable cause)

How about just throw ExecutionException?  It will be much simpler and the 
caller doesn't care about which exception it is in this case.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/#review4138
---


On 2011-12-29 18:38:54, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3323/
bq.  ---
bq.  
bq.  (Updated 2011-12-29 18:38:54)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.
bq.  
bq.  I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.
bq.  
bq.  
bq.  This addresses bug HBASE-5099.
bq.  https://issues.apache.org/jira/browse/HBASE-5099
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3323/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -PlocalTests -Dtest=TestMaster* clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177370#comment-13177370
 ] 

Zhihong Yu commented on HBASE-5105:
---

I applied the patch.

TestImportTsv passes against both hadoop 0.22 and hadoop 1.0

Will integrate if there is no objection.

 TestImportTsv failed with hadoop 0.22
 -

 Key: HBASE-5105
 URL: https://issues.apache.org/jira/browse/HBASE-5105
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.0
Reporter: Ming Ma
 Attachments: HBASE-5105-0.92.patch


 java.io.FileNotFoundException: File does not exist: 
 /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331)
   at 
 org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215)
   at 
 org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177373#comment-13177373
 ] 

Zhihong Yu commented on HBASE-5099:
---

bq. How about just throw ExecutionException? 
That is fine. tryRecoveringExpiredZKSession() is private.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5099:
---

Attachment: hbase-5099-v4.patch

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177375#comment-13177375
 ] 

Hadoop QA commented on HBASE-5099:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508868/hbase-5099-v3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/628//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/628//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/628//console

This message is automatically generated.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177376#comment-13177376
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/
---

(Updated 2011-12-29 20:41:26.086617)


Review request for hbase, Ted Yu and Michael Stack.


Changes
---

Renamed the test class.  tryRecoveringExpiredZKSession() throws 
ExecutionException now.


Summary
---

Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.

I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.


This addresses bug HBASE-5099.
https://issues.apache.org/jira/browse/HBASE-5099


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
  src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/3323/diff


Testing
---

mvn -PlocalTests -Dtest=TestMaster* clean test


Thanks,

Jimmy



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit

2011-12-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177379#comment-13177379
 ] 

Jonathan Hsieh commented on HBASE-5101:
---

@Todd

Looks like it, closing as a dupe.

 Add a max number of regions per regionserver limit
 --

 Key: HBASE-5101
 URL: https://issues.apache.org/jira/browse/HBASE-5101
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 In a testing environment, a cluster got to a state with more than 1500 
 regions per region server, and essentially became stuck and unavailable.  We 
 could add a limit to the number of regions that a region server can serve to 
 prevent this from happening.  This looks like it could be implemented in the 
 core or as a coprocessor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5101) Add a max number of regions per regionserver limit

2011-12-29 Thread Jonathan Hsieh (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh resolved HBASE-5101.
---

Resolution: Duplicate
  Assignee: Jonathan Hsieh

This is a dupe of HBASE-2844, and later discussion lead to something similar to 
HBASE-2956.  

 Add a max number of regions per regionserver limit
 --

 Key: HBASE-5101
 URL: https://issues.apache.org/jira/browse/HBASE-5101
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 In a testing environment, a cluster got to a state with more than 1500 
 regions per region server, and essentially became stuck and unavailable.  We 
 could add a limit to the number of regions that a region server can serve to 
 prevent this from happening.  This looks like it could be implemented in the 
 core or as a coprocessor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177386#comment-13177386
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/#review4152
---


I like this version.

Minor comments below.


src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/3323/#comment9377

Line should wrap after = sign.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/3323/#comment9376

The catch block is redundant.



src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java
https://reviews.apache.org/r/3323/#comment9378

Javadoc for test class.



src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java
https://reviews.apache.org/r/3323/#comment9379

How about giving hbase.master.zksession.recover.timeout a much small 
value for the mini cluster so that the timeout value on line 73 can be lowered ?


- Ted


On 2011-12-29 20:41:26, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3323/
bq.  ---
bq.  
bq.  (Updated 2011-12-29 20:41:26)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.
bq.  
bq.  I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.
bq.  
bq.  
bq.  This addresses bug HBASE-5099.
bq.  https://issues.apache.org/jira/browse/HBASE-5099
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3323/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -PlocalTests -Dtest=TestMaster* clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5064) use surefire tests parallelization

2011-12-29 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177388#comment-13177388
 ] 

nkeywal commented on HBASE-5064:


From the last 5 execution, it seems we have NumberFormatException + some 
random failure that I would tend to attribute to the usual flakiness (I may be 
wrong :-)). The FileNotFound exception was not reproduced.

@ted: from the time taken by TestAdmin (twice the time on hadoop-qa, and very 
close to the test timeout of 900 seconds), I think that the machine is heavily 
loaded, or lacks some memory and swaps. That makes me think that we should 
default the // degree to 2 (and keep 4 on hadoop-qa).

 use surefire tests parallelization
 --

 Key: HBASE-5064
 URL: https://issues.apache.org/jira/browse/HBASE-5064
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 
 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 
 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 
 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 
 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 
 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 
 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 
 5064.v9.patch


 To be tried multiple times on hadoop-qa before committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177394#comment-13177394
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--



bq.  On 2011-12-29 20:55:31, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1463
bq.   https://reviews.apache.org/r/3323/diff/4/?file=65969#file65969line1463
bq.  
bq.   The catch block is redundant.

That's right.  I forgot it :)


bq.  On 2011-12-29 20:55:31, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1453
bq.   https://reviews.apache.org/r/3323/diff/4/?file=65969#file65969line1453
bq.  
bq.   Line should wrap after = sign.

too long?  sure.


bq.  On 2011-12-29 20:55:31, Ted Yu wrote:
bq.   
src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java, 
line 41
bq.   https://reviews.apache.org/r/3323/diff/4/?file=65970#file65970line41
bq.  
bq.   How about giving hbase.master.zksession.recover.timeout a much 
small value for the mini cluster so that the timeout value on line 73 can be 
lowered ?

sure


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/#review4152
---


On 2011-12-29 20:41:26, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3323/
bq.  ---
bq.  
bq.  (Updated 2011-12-29 20:41:26)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.
bq.  
bq.  I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.
bq.  
bq.  
bq.  This addresses bug HBASE-5099.
bq.  https://issues.apache.org/jira/browse/HBASE-5099
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3323/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -PlocalTests -Dtest=TestMaster* clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization

2011-12-29 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177396#comment-13177396
 ] 

Jonathan Hsieh commented on HBASE-5103:
---

should this get marked as 0.92.1 instead of 0.92.0?


 Fix improper master znode deserialization
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177397#comment-13177397
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/
---

(Updated 2011-12-29 21:16:22.655431)


Review request for hbase, Ted Yu and Michael Stack.


Changes
---

Minor changes per review comments.


Summary
---

Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.

I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.


This addresses bug HBASE-5099.
https://issues.apache.org/jira/browse/HBASE-5099


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
  src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/3323/diff


Testing
---

mvn -PlocalTests -Dtest=TestMaster* clean test


Thanks,

Jimmy



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5099:
---

Attachment: hbase-5099-v5.patch

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization

2011-12-29 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177399#comment-13177399
 ] 

Hudson commented on HBASE-5103:
---

Integrated in HBase-TRUNK #2590 (See 
[https://builds.apache.org/job/HBase-TRUNK/2590/])
HBASE-5103  Fix improper master znode deserialization (Jonathan Hsieh)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java


 Fix improper master znode deserialization
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177398#comment-13177398
 ] 

Zhihong Yu commented on HBASE-5103:
---

0.92 hasn't been released yet :-)
Without HBASE-5081 and HBASE-5099 fixed, I don't think 0.92 would be ready.

 Fix improper master znode deserialization
 -

 Key: HBASE-5103
 URL: https://issues.apache.org/jira/browse/HBASE-5103
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: hbase-5103.patch


 In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is 
 created as a versioned serialized version of ServerName
 {code}
  if (ZKUtil.createEphemeralNodeAndWatch(this.watcher,
   this.watcher.masterAddressZNode, sn.getVersionedBytes())) {
 {code}
 There are a few user visible places where it is used but not deserialized 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177404#comment-13177404
 ] 

Zhihong Yu commented on HBASE-5099:
---

@Jimmy:
Do you mind producing a patch for 0.92 ?
I got the following when I tried to apply patch v5 to 0.92:
{code}
1 out of 3 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/master/HMaster.java.rej
{code}

Also there is no test category in 0.92:
{code}
[ERROR] 
/Users/zhihyu/92hbase/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java:[27,30]
 cannot find symbol
[ERROR] symbol  : class MediumTests
[ERROR] location: package org.apache.hadoop.hbase
{code}

This is a whitespace at line 41 in TestMasterZKSessionRecovery - I can remove 
it at time of integration.

Nice work.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT

2011-12-29 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177405#comment-13177405
 ] 

Kannan Muthukkaruppan commented on HBASE-5104:
--

Additional note: The use of ColumnPaginationFilter AND ColumnPrefixFilter for 
the intended use case (i.e. to get next set of 5 thread ids for tag2) probably 
will not work even after this bug is fixed. I think once the bug is fixed for 
the above data set, the program should return 0 kvs.

Here's why:
  (select * from Tab where filter1 and filter2)
should be same as:
  (select * from Tab where filter1) INTERSECT (select * from Tab where filter2)

When separately applied, filter1 will return rows with tag1 prefix and filter2 
will return rows with tag0 prefix (for Jiakai's example above) and the 
INTERSECTION will be the empty set.

The real confusion here seems to be because of the use of a filter for 
pagination. This seems odd. In normal SQL for example, pagination is not part 
of the WHERE clause but a separate special clause (as if it was being applied 
on the results of a sub-query).

 (select * from Tab 
  where  column  LIKE 'tag1%
  LIMIT 5 OFFSET 5)

Possible ways of supporting this use case:

1) Don't use AND (via FilterList), but enhance ColumnPrefixFilter to support 
another constructor which supports limit/offset.
2) Support pagination (limit/offset) at the Scan/Get API level (rather than as 
a filter) [Like SQL].

Thoughts?






 FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
 use the SEEK_NEXT_USING_HINT
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: testFilterList.rb


 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177409#comment-13177409
 ] 

Zhihong Yu commented on HBASE-5104:
---

Option #1 is easier to implement.

I prefer option #2 which is really interesting feature.

So we have two more JIRAs to work on :-)

 FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
 use the SEEK_NEXT_USING_HINT
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: testFilterList.rb


 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177410#comment-13177410
 ] 

Zhihong Yu commented on HBASE-5099:
---

Will integrate tomorrow morning if I don't get objections.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5099:
--

Attachment: 5099.92

Patch for 0.92 branch where all TestMasterXX tests passed.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177417#comment-13177417
 ] 

Jimmy Xiang commented on HBASE-5099:


@Ted, thanks!

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177421#comment-13177421
 ] 

Zhihong Yu commented on HBASE-5099:
---

@Jimmy:
Can you put the patch on a real cluster and see if the 300 second timeout is 
good for zk session recovery ?

Thanks

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5099:
--

Fix Version/s: 0.94.0
   0.92.0
 Hadoop Flags: Reviewed

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT

2011-12-29 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177425#comment-13177425
 ] 

Lars Hofhansl commented on HBASE-5104:
--

It seems to me that ColumnPaginationFilter should wrap another filter (similar 
to how FilterList wraps other filters), and then apply the pagination on the 
result (the wrapper filter of course could be another filter list).


 FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
 use the SEEK_NEXT_USING_HINT
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: testFilterList.rb


 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT

2011-12-29 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177430#comment-13177430
 ] 

Kannan Muthukkaruppan commented on HBASE-5104:
--

Lars: I think viewing it as a general filter is problematic. Note: a FilterList 
wraps other filters, but it is perfectly legal for it to be wrapped by outer 
level filters... (say if you want to build a complex boolean expression). 
Whereas the pagination functionality doesn't lend itself to this flexibility. 
It is more like a special post-processing -only filter that you apply at the 
end stage. 


 FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
 use the SEEK_NEXT_USING_HINT
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: testFilterList.rb


 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177431#comment-13177431
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/#review4154
---



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/3323/#comment9383

Should we issue executor.shutdownNow() here if the worker did not finish, 
in order to attempt to interrupt and long running worker?


- Lars


On 2011-12-29 21:16:22, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3323/
bq.  ---
bq.  
bq.  (Updated 2011-12-29 21:16:22)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.
bq.  
bq.  I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.
bq.  
bq.  
bq.  This addresses bug HBASE-5099.
bq.  https://issues.apache.org/jira/browse/HBASE-5099
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3323/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -PlocalTests -Dtest=TestMaster* clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5104) FilterList doesn't work right with ColumnPaginationFilter

2011-12-29 Thread Kannan Muthukkaruppan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-5104:
-

Summary: FilterList doesn't work right with ColumnPaginationFilter  (was: 
FilterList doesn't work right with filters (such as ColumPrefixFilter) which 
use the SEEK_NEXT_USING_HINT)

 FilterList doesn't work right with ColumnPaginationFilter
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: testFilterList.rb


 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177433#comment-13177433
 ] 

Zhihong Yu commented on HBASE-5099:
---

I think executor.shutdown() is appropriate here since we want the original body 
of tryRecoveringExpiredZKSession() to complete.
There is new timeout governing the wait for shutdown to complete.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) FilterList doesn't work right with ColumnPaginationFilter

2011-12-29 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177435#comment-13177435
 ] 

Lars Hofhansl commented on HBASE-5104:
--

@Kannan: Right, that's what I had in mind. You build up your filter (using 
whatever FilterList nesting you need) and then wrap the end result into a 
ColumnPaginationFilter (which would optionally take a filter). That way it can 
be applied at the end-stage while not requiring any API changes to Get/Scan.

I see what your saying, though, it rarely makes sense to wrap a 
ColumnPaginationFilter into a FilterList... We could just document it as such.


 FilterList doesn't work right with ColumnPaginationFilter
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: testFilterList.rb


 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177438#comment-13177438
 ] 

jirapos...@reviews.apache.org commented on HBASE-5099:
--



bq.  On 2011-12-29 22:29:34, Lars Hofhansl wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1464
bq.   https://reviews.apache.org/r/3323/diff/5/?file=65971#file65971line1464
bq.  
bq.   Should we issue executor.shutdownNow() here if the worker did not 
finish, in order to attempt to interrupt and long running worker?

Yes, that will be safer.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3323/#review4154
---


On 2011-12-29 21:16:22, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3323/
bq.  ---
bq.  
bq.  (Updated 2011-12-29 21:16:22)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Per discussion with Ted (on issues), I put up a patch to run 
tryRecoveringExpiredZKSession() in a separate thread and time it out and fail 
the recovery if it is stuck somewhere.
bq.  
bq.  I added a test to test the abort method.  However, for the mini cluster, 
becomeActiveMaster() doesn't succeed so the abort method ends up always 
aborted.  So the actually success recovery is not tested.
bq.  
bq.  
bq.  This addresses bug HBASE-5099.
bq.  https://issues.apache.org/jira/browse/HBASE-5099
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/3323/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  mvn -PlocalTests -Dtest=TestMaster* clean test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again

2011-12-29 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177437#comment-13177437
 ] 

Hadoop QA commented on HBASE-5100:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508876/5100.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 76 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/631//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/631//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/631//console

This message is automatically generated.

 Rollback of split could cause closed region to be opened again
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0

 Attachments: 5100.txt, hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, 

[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re

2011-12-29 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5099:
---

Attachment: hbase-5099-v6.patch

Updated per Lars's comment.  It will be safer.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >