[jira] [Commented] (HBASE-3924) Improve Shell's CLI help
[ https://issues.apache.org/jira/browse/HBASE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177087#comment-13177087 ] Harsh J commented on HBASE-3924: Lars, Sorry for the late get back. Thanks for committing your changes in! :) Improve Shell's CLI help Key: HBASE-3924 URL: https://issues.apache.org/jira/browse/HBASE-3924 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.90.3 Reporter: Lars George Assignee: Harsh J Priority: Trivial Fix For: 0.94.0 Attachments: 3924.txt, HBASE-3924.patch In the hirb.rb source we have {noformat} # so they don't go through to irb. Output shell 'usage' if user types '--help' cmdline_help = HERE # HERE document output as shell usage HBase Shell command-line options: formatFormatter for outputting results: console | html. Default: console -d | --debug Set DEBUG log levels. HERE found = [] format = 'console' script2run = nil log_level = org.apache.log4j.Level::ERROR for arg in ARGV if arg =~ /^--format=(.+)/i format = $1 if format =~ /^html$/i raise NoMethodError.new(Not yet implemented) elsif format =~ /^console$/i # This is default else raise ArgumentError.new(Unsupported format + arg) end found.push(arg) elsif arg == '-h' || arg == '--help' puts cmdline_help exit elsif arg == '-d' || arg == '--debug' log_level = org.apache.log4j.Level::DEBUG $fullBackTrace = true puts Setting DEBUG log level... else # Presume it a script. Save it off for running later below # after we've set up some environment. script2run = arg found.push(arg) # Presume that any other args are meant for the script. break end end {noformat} We should enhance the help printed when using -h/--help to look like this? {noformat} cmdline_help = HERE # HERE document output as shell usage HBase Shell command-line options: --format={console|html}Formatter for outputting results. Default: console -d | --debug Set DEBUG log levels. -h | --help This help. script-filename [script-options] HERE {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177282#comment-13177282 ] Zhihong Yu commented on HBASE-5064: --- I saw patch v19 coming and started to run test suited with it - I didn't keep the output from TestMiniClusterLoadParallel w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) for surefire.secondPartThreadCount if the parameter isn't set. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Attachment: 5064.v19.patch use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177331#comment-13177331 ] Zhihong Yu commented on HBASE-5103: --- Unit tests are passing. Integrated to 0.92 and TRUNK. Thanks for the patch Jonathan. Thanks for the review Lars. Fix improper master znode deserialization - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177333#comment-13177333 ] Hadoop QA commented on HBASE-5100: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508859/5100.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/627//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/627//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/627//console This message is automatically generated. Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177327#comment-13177327 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/#review4148 --- The code is much closer to what was in my mind. The new test needs improvement. src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/3323/#comment9361 Line is 88 chars long. If ExecutorService is imported, the line should be much shorter. src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/3323/#comment9364 I suggest recording the amount of time taken by the recovery and log it after executor.awaitTermination() returns. src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/3323/#comment9362 Timeout value should be included in the exception message. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java https://reviews.apache.org/r/3323/#comment9365 Year is not needed. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java https://reviews.apache.org/r/3323/#comment9363 This shouldn't be small test since mini cluster is involved. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java https://reviews.apache.org/r/3323/#comment9366 Please add short javadoc for this new class. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java https://reviews.apache.org/r/3323/#comment9368 We shouldn't pass 1 here since that means 1 master. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java https://reviews.apache.org/r/3323/#comment9367 Since the test doesn't involve standby master, I think we should use a different name. - Ted On 2011-12-29 18:38:54, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3323/ bq. --- bq. bq. (Updated 2011-12-29 18:38:54) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. bq. bq. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. bq. bq. bq. This addresses bug HBASE-5099. bq. https://issues.apache.org/jira/browse/HBASE-5099 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3323/diff bq. bq. bq. Testing bq. --- bq. bq. mvn -PlocalTests -Dtest=TestMaster* clean test bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one
[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177339#comment-13177339 ] Zhihong Yu commented on HBASE-5100: --- TestMasterObserver passed locally on my MacBook. Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for
[jira] [Updated] (HBASE-5103) Fix improper master znode deserialization
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5103: -- Fix Version/s: 0.94.0 0.92.0 Hadoop Flags: Reviewed Summary: Fix improper master znode deserialization (was: Fix places where master znode is improperly deserialized.) Fix improper master znode deserialization - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5010) Filter HFiles based on TTL
[ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177335#comment-13177335 ] Phabricator commented on HBASE-5010: mbautin has commented on the revision [jira] [HBASE-5010] [89-fb] Filter HFiles based on TTL. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:209 Yes, that's correct. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:224 Yes, this is the way it was in the original fix (89-fb version). REVISION DETAIL https://reviews.facebook.net/D909 Filter HFiles based on TTL -- Key: HBASE-5010 URL: https://issues.apache.org/jira/browse/HBASE-5010 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Zhihong Yu Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch In ScanWildcardColumnTracker we have {code:java} this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } {code} but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177282#comment-13177282 ] Zhihong Yu edited comment on HBASE-5064 at 12/29/11 5:20 PM: - I saw patch v19 coming and started to run test suite with it - I didn't keep the output from TestMiniClusterLoadParallel w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) for surefire.secondPartThreadCount if the parameter isn't set. was (Author: zhi...@ebaysf.com): I saw patch v19 coming and started to run test suited with it - I didn't keep the output from TestMiniClusterLoadParallel w.r.t. parameter for the hadoop-qa build, we should set default value (of 2) for surefire.secondPartThreadCount if the parameter isn't set. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Attachment: 5064.v18.patch use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5099: --- Attachment: hbase-5099-v3.patch ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Patch Available (was: Open) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177274#comment-13177274 ] nkeywal commented on HBASE-5064: @ted: The 3 last tests seems ok on hadoop-qa, even if I'am sure that we will discover some issues. A 4th test is in progress. I will do a 5th also. Anything meaningful in the logs for TestMiniClusterLoadParallel? Note that the v18 patch is with 4 tests in //, I believe we will need to commit it with 2 or 3 to be sure that a 1 year old laptop can run the tests smoothly. I use 4 here to maximize // and minimize test time. We can set a parameter to the hadoop-qa build to use 4 or more if we want to... use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5103) Fix places where master znode is improperly deserialized.
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5103: -- Status: Patch Available (was: Open) Fix places where master znode is improperly deserialized. - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Open (was: Patch Available) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Attachment: 5064.v19.patch use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5100: -- Attachment: 5100.txt Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. destination server is +
[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5100: -- Attachment: (was: 5100.txt) Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. destination server is +
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177260#comment-13177260 ] Zhihong Yu commented on HBASE-5064: --- Trying out patch v18 on Linux I got: {code} Tests in error: loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test timed out after 12 milliseconds {code} open files limit is 32768 and max user processes are unlimited. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177210#comment-13177210 ] Hadoop QA commented on HBASE-5064: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508842/5064.v18.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/623//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/623//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/623//console This message is automatically generated. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5100) Rollback of split would cause closed region to opened
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5100: -- Attachment: 5100.txt Rollback of split would cause closed region to opened -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780, load=(requests=0, regions=1,
[jira] [Commented] (HBASE-5103) Fix places where master znode is improperly deserialized.
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177277#comment-13177277 ] Zhihong Yu commented on HBASE-5103: --- +1 on patch. Nice catch. Fix places where master znode is improperly deserialized. - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177342#comment-13177342 ] Zhihong Yu commented on HBASE-5099: --- bq. We shouldn't pass 1 here since that means 1 master. The new test is different from TestMasterFailover so I guess 1 could be fine here. How about naming the new test TestZKSessionRecoveryInMaster ? ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5100: -- Fix Version/s: 0.94.0 0.92.0 Summary: Rollback of split could cause closed region to be opened again (was: Rollback of split would cause closed region to opened ) Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for
[jira] [Commented] (HBASE-5100) Rollback of split would cause closed region to opened
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177268#comment-13177268 ] Zhihong Yu commented on HBASE-5100: --- I thought I like Java until I understood what createDaughters() should be doing - with Chunhui's patch, i.e. The case is that we need to handle two types of exceptions: the one coming out of parent.close(false) call and the IOE thrown from within the try block. I got some idea when I was driving this morning. 5100.txt is my proposed form where I tried to disambiguate the two types of exceptions by removing finally block. closedByOtherException is a singleton so the overhead of my proposal vs. introducing a boolean is negligible. I ran split-related tests and they passed: {code} 806 mt -Dtest=TestCoprocessorInterface 807 mt -Dtest=TestSplitTransaction 808 mt -Dtest=TestSplitTransactionOnCluster 809 mt -Dtest=TestEndToEndSplitTransaction 810 mt -Dtest=TestHRegion {code} Please share your comments. Rollback of split would cause closed region to opened -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Open (was: Patch Available) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-4218: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508818/D447.15.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 68 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/621//console This message is automatically generated.) Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5103) Fix a places where master znode is improperly deserialized.
Fix a places where master znode is improperly deserialized. --- Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Patch Available (was: Open) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4934) Display Master server and Regionserver start time on respective info servers.
[ https://issues.apache.org/jira/browse/HBASE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177343#comment-13177343 ] Ming Ma commented on HBASE-4934: Jonathan, there is a special case when expired ZK session is recovered by the master. masterActiveTime is set in finishInitialization which won't be invoked in that scenario. Perhaps moving the setting of masterActiveTime to becomeActiveMaster method will solve the problem? Display Master server and Regionserver start time on respective info servers. - Key: HBASE-4934 URL: https://issues.apache.org/jira/browse/HBASE-4934 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0 Attachments: hbase-4934.patch, hmaster.png, hregion.png With operations like rolling restart or master failovers, it is difficult to tell if a server is the old instance or the new restarted instance. Adding a start date stamp on the info web pages would be helpful for determining this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Attachment: 5064.v17.patch use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Patch Available (was: Open) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5103) Fix places where master znode is improperly deserialized.
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177308#comment-13177308 ] Lars Hofhansl commented on HBASE-5103: -- +1 Fix places where master znode is improperly deserialized. - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit
[ https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177206#comment-13177206 ] Jonathan Hsieh commented on HBASE-5101: --- @Ted First cut -- simply some configured number of max regions. Maybe it would stop splitting at a point where it can handle if some number of rs's going down and its regions are reassigned. There could possibly be a limit on tables as well -- and we could put cap on the # of region servers/table and # of regions/region server. Add a max number of regions per regionserver limit -- Key: HBASE-5101 URL: https://issues.apache.org/jira/browse/HBASE-5101 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Jonathan Hsieh In a testing environment, a cluster got to a state with more than 1500 regions per region server, and essentially became stuck and unavailable. We could add a limit to the number of regions that a region server can serve to prevent this from happening. This looks like it could be implemented in the core or as a coprocessor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177278#comment-13177278 ] Hadoop QA commented on HBASE-5064: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508854/5064.v19.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/625//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/625//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/625//console This message is automatically generated. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Open (was: Patch Available) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5103) Fix places where master znode is improperly deserialized.
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5103: -- Attachment: hbase-5103.patch Applies on trunk and 0.92. Fix places where master znode is improperly deserialized. - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177287#comment-13177287 ] Zhihong Yu commented on HBASE-5064: --- TestMiniClusterLoadParallel failed again for patch v19. From its output: {code} 2011-12-29 09:24:49,249 WARN [HBaseReaderThread_7] zookeeper.RecoverableZooKeeper(159): Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase 2011-12-29 09:24:49,249 INFO [HBaseReaderThread_7] util.RetryCounter(53): The 1 times to retry after sleeping 2000 ms ... {code} There're 3 failures so far: {code} Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 35, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 885.494 sec FAILURE! -- Running org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 192.578 sec FAILURE! -- Running org.apache.hadoop.hbase.regionserver.TestRpcMetrics Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.106 sec FAILURE! {code} use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177185#comment-13177185 ] Hadoop QA commented on HBASE-5064: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508839/5064.v17.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestLruBlockCache Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/622//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/622//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/622//console This message is automatically generated. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5083) Backup HMaster should have http infoport open with link to the active master
[ https://issues.apache.org/jira/browse/HBASE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-5083: - Assignee: Jonathan Hsieh Backup HMaster should have http infoport open with link to the active master Key: HBASE-5083 URL: https://issues.apache.org/jira/browse/HBASE-5083 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Without ssh'ing and jps/ps'ing, it is difficult to see if a backup hmaster is up. It seems like it would be good for a backup hmaster to have a basic web page up on the info port so that users could see that it is up. Also it should probably either provide a link to the active master or automatically forward to the active master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Attachment: 5064.v19.patch use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Attachment: 5064.v18.patch use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5099: --- Status: Patch Available (was: Open) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Patch Available (was: Open) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177351#comment-13177351 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- bq. On 2011-12-29 19:02:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1475 bq. https://reviews.apache.org/r/3323/diff/3/?file=65929#file65929line1475 bq. bq. Timeout value should be included in the exception message. It may not because of timeout. bq. On 2011-12-29 19:02:17, Ted Yu wrote: bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, line 2 bq. https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line2 bq. bq. Year is not needed. I copied it from another test case. I can remove it. bq. On 2011-12-29 19:02:17, Ted Yu wrote: bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, line 34 bq. https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line34 bq. bq. This shouldn't be small test since mini cluster is involved. Ok. bq. On 2011-12-29 19:02:17, Ted Yu wrote: bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, line 41 bq. https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line41 bq. bq. We shouldn't pass 1 here since that means 1 master. It means 1 region server. Actually, I do want 1 master so it can recover itself. Otherwise, the backup master will take over and the active master doesn't have a chance to recover in this scenario. bq. On 2011-12-29 19:02:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1452 bq. https://reviews.apache.org/r/3323/diff/3/?file=65929#file65929line1452 bq. bq. Line is 88 chars long. bq. If ExecutorService is imported, the line should be much shorter. We can't do this since there is already an ExecutorService (from hbase) imported. I can't use the hbase ExecutorService because it doesn't fit. bq. On 2011-12-29 19:02:17, Ted Yu wrote: bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java, line 50 bq. https://reviews.apache.org/r/3323/diff/3/?file=65930#file65930line50 bq. bq. Since the test doesn't involve standby master, I think we should use a different name. There is test case called TestMasterFailover which involves standby master. That's why I called it TestMasterRecovery. How about TestMasterZKSessionRecovery? - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/#review4148 --- On 2011-12-29 18:38:54, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3323/ bq. --- bq. bq. (Updated 2011-12-29 18:38:54) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. bq. bq. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. bq. bq. bq. This addresses bug HBASE-5099. bq. https://issues.apache.org/jira/browse/HBASE-5099 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3323/diff bq. bq. bq. Testing bq. --- bq. bq. mvn -PlocalTests -Dtest=TestMaster* clean test bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died.
[jira] [Commented] (HBASE-5055) Build against hadoop 0.22 broken
[ https://issues.apache.org/jira/browse/HBASE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177349#comment-13177349 ] Zhihong Yu commented on HBASE-5055: --- Good catch Ming. Addendum integrated to 0.92 branch. Build against hadoop 0.22 broken Key: HBASE-5055 URL: https://issues.apache.org/jira/browse/HBASE-5055 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Zhihong Yu Assignee: stack Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: 5055.txt, HBASE-5055-0.92.patch I got the following when compiling TRUNK against hadoop 0.22: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project hbase: Compilation failure: Compilation failure: [ERROR] /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[37,39] cannot find symbol [ERROR] symbol : class DFSInputStream [ERROR] location: class org.apache.hadoop.hdfs.DFSClient [ERROR] [ERROR] /Users/zhihyu/trunk-hbase/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:[109,37] cannot find symbol [ERROR] symbol : class DFSInputStream [ERROR] location: class org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.WALReader.WALReaderFSDataInputStream {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Open (was: Patch Available) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT
FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Patch Available (was: Open) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit
[ https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177352#comment-13177352 ] Todd Lipcon commented on HBASE-5101: Isnt this hbase.regionserver.regionSplitLimit (HBASE-2844)? Add a max number of regions per regionserver limit -- Key: HBASE-5101 URL: https://issues.apache.org/jira/browse/HBASE-5101 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Jonathan Hsieh In a testing environment, a cluster got to a state with more than 1500 regions per region server, and essentially became stuck and unavailable. We could add a limit to the number of regions that a region server can serve to prevent this from happening. This looks like it could be implemented in the core or as a coprocessor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177311#comment-13177311 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/ --- (Updated 2011-12-29 18:38:54.473524) Review request for hbase, Ted Yu and Michael Stack. Changes --- Updated per Ted's comment. Also separated the test class and added a possible testcase. Summary --- Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. This addresses bug HBASE-5099. https://issues.apache.org/jira/browse/HBASE-5099 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java PRE-CREATION Diff: https://reviews.apache.org/r/3323/diff Testing --- mvn -PlocalTests -Dtest=TestMaster* clean test Thanks, Jimmy ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit
[ https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177224#comment-13177224 ] Zhihong Yu commented on HBASE-5101: --- @Jon: I think HBASE-4365 tried to tackle the same problem in a different way. Meaning, a new RegionSplitPolicy implementation should be created. Shall we continue along that route ? Add a max number of regions per regionserver limit -- Key: HBASE-5101 URL: https://issues.apache.org/jira/browse/HBASE-5101 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Jonathan Hsieh In a testing environment, a cluster got to a state with more than 1500 regions per region server, and essentially became stuck and unavailable. We could add a limit to the number of regions that a region server can serve to prevent this from happening. This looks like it could be implemented in the core or as a coprocessor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5099: --- Status: Open (was: Patch Available) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177228#comment-13177228 ] Hadoop QA commented on HBASE-5064: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508844/5064.v18.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/624//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/624//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/624//console This message is automatically generated. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5103) Fix places where master znode is improperly deserialized.
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5103: -- Description: In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. was: In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. Priority: Minor (was: Major) Summary: Fix places where master znode is improperly deserialized. (was: Fix a places where master znode is improperly deserialized.) Fix places where master znode is improperly deserialized. - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5103) Fix places where master znode is improperly deserialized.
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177312#comment-13177312 ] Hadoop QA commented on HBASE-5103: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508858/hbase-5103.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/626//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/626//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/626//console This message is automatically generated. Fix places where master znode is improperly deserialized. - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5064: --- Status: Open (was: Patch Available) use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177285#comment-13177285 ] nkeywal commented on HBASE-5064: You want 2 process in // only for hadoop-qa? It seems that the machine never runs two builds in //, so we should maximize the number of process. That makes the build last ~35 minutes instead of 2 hours. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5010) Filter HFiles based on TTL
[ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5010: --- Attachment: D909.6.patch mbautin updated the revision [jira] [HBASE-5010] [89-fb] Filter HFiles based on TTL. Reviewers: Kannan, Liyin, JIRA Addressing the issue that Kannan pointed out: getScanner does not use its arguments. It turned out that getScanner was called with this.scan and this.columns in all callsites, so I have removed those arguments. REVISION DETAIL https://reviews.facebook.net/D909 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java src/main/java/org/apache/hadoop/hbase/util/Threads.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestScannerSelectionUsingTTL.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java Filter HFiles based on TTL -- Key: HBASE-5010 URL: https://issues.apache.org/jira/browse/HBASE-5010 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Zhihong Yu Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch In ScanWildcardColumnTracker we have {code:java} this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } {code} but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177359#comment-13177359 ] Hadoop QA commented on HBASE-5064: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508863/5064.v19.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/629//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/629//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/629//console This message is automatically generated. use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Muthukkaruppan updated HBASE-5104: - Attachment: testFilterList.rb A test case illustrating the problem attached. FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5010) Filter HFiles based on TTL
[ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177361#comment-13177361 ] Phabricator commented on HBASE-5010: Kannan has accepted the revision [jira] [HBASE-5010] [89-fb] Filter HFiles based on TTL. awesome optimization and nice work! REVISION DETAIL https://reviews.facebook.net/D909 Filter HFiles based on TTL -- Key: HBASE-5010 URL: https://issues.apache.org/jira/browse/HBASE-5010 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Zhihong Yu Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch In ScanWildcardColumnTracker we have {code:java} this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } {code} but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177363#comment-13177363 ] Zhihong Yu commented on HBASE-5099: --- TestMasterZKSessionRecovery is a good name for the new test. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5105) TestImportTsv failed with hadoop 0.22
TestImportTsv failed with hadoop 0.22 - Key: HBASE-5105 URL: https://issues.apache.org/jira/browse/HBASE-5105 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0 Reporter: Ming Ma java.io.FileNotFoundException: File does not exist: /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5105) TestImportTsv failed with hadoop 0.22
[ https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HBASE-5105: --- Attachment: HBASE-5105-0.92.patch Fix the test code to create miniMapReduceCluster and pass the right configuration to Job object. TestImportTsv failed with hadoop 0.22 - Key: HBASE-5105 URL: https://issues.apache.org/jira/browse/HBASE-5105 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0 Reporter: Ming Ma Attachments: HBASE-5105-0.92.patch java.io.FileNotFoundException: File does not exist: /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5105) TestImportTsv failed with hadoop 0.22
[ https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5105: -- Status: Patch Available (was: Open) TestImportTsv failed with hadoop 0.22 - Key: HBASE-5105 URL: https://issues.apache.org/jira/browse/HBASE-5105 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0 Reporter: Ming Ma Attachments: HBASE-5105-0.92.patch java.io.FileNotFoundException: File does not exist: /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177369#comment-13177369 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- bq. On 2011-12-28 22:52:56, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1463 bq. https://reviews.apache.org/r/3323/diff/2/?file=65902#file65902line1463 bq. bq. The call() at line 1429 would only throw 3 types of exceptions: IE, IOE and KE. bq. bq. So the cause should be one the three types. bq. I suggest using instanceof to check against each of them and if a match is found, cast as that exception type and rethrow. bq. bq. If there is no match, I suggest using this ctor to throw a new IOException: bq. IOException(String message, Throwable cause) How about just throw ExecutionException? It will be much simpler and the caller doesn't care about which exception it is in this case. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/#review4138 --- On 2011-12-29 18:38:54, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3323/ bq. --- bq. bq. (Updated 2011-12-29 18:38:54) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. bq. bq. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. bq. bq. bq. This addresses bug HBASE-5099. bq. https://issues.apache.org/jira/browse/HBASE-5099 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterRecovery.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3323/diff bq. bq. bq. Testing bq. --- bq. bq. mvn -PlocalTests -Dtest=TestMaster* clean test bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5105) TestImportTsv failed with hadoop 0.22
[ https://issues.apache.org/jira/browse/HBASE-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177370#comment-13177370 ] Zhihong Yu commented on HBASE-5105: --- I applied the patch. TestImportTsv passes against both hadoop 0.22 and hadoop 1.0 Will integrate if there is no objection. TestImportTsv failed with hadoop 0.22 - Key: HBASE-5105 URL: https://issues.apache.org/jira/browse/HBASE-5105 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.0 Reporter: Ming Ma Attachments: HBASE-5105-0.92.patch java.io.FileNotFoundException: File does not exist: /home/henkins/.m2/repository/org/apache/hadoop/hadoop-mapred/0.22-SNAPSHOT/hadoop-mapred-0.22-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:742) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:331) at org.apache.hadoop.mapreduce.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:711) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:245) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:283) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:350) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.doMROnTableTest(TestImportTsv.java:215) at org.apache.hadoop.hbase.mapreduce.TestImportTsv.testMROnTable(TestImportTsv.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177373#comment-13177373 ] Zhihong Yu commented on HBASE-5099: --- bq. How about just throw ExecutionException? That is fine. tryRecoveringExpiredZKSession() is private. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5099: --- Attachment: hbase-5099-v4.patch ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177375#comment-13177375 ] Hadoop QA commented on HBASE-5099: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508868/hbase-5099-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/628//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/628//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/628//console This message is automatically generated. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177376#comment-13177376 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/ --- (Updated 2011-12-29 20:41:26.086617) Review request for hbase, Ted Yu and Michael Stack. Changes --- Renamed the test class. tryRecoveringExpiredZKSession() throws ExecutionException now. Summary --- Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. This addresses bug HBASE-5099. https://issues.apache.org/jira/browse/HBASE-5099 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java PRE-CREATION Diff: https://reviews.apache.org/r/3323/diff Testing --- mvn -PlocalTests -Dtest=TestMaster* clean test Thanks, Jimmy ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5101) Add a max number of regions per regionserver limit
[ https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177379#comment-13177379 ] Jonathan Hsieh commented on HBASE-5101: --- @Todd Looks like it, closing as a dupe. Add a max number of regions per regionserver limit -- Key: HBASE-5101 URL: https://issues.apache.org/jira/browse/HBASE-5101 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh In a testing environment, a cluster got to a state with more than 1500 regions per region server, and essentially became stuck and unavailable. We could add a limit to the number of regions that a region server can serve to prevent this from happening. This looks like it could be implemented in the core or as a coprocessor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5101) Add a max number of regions per regionserver limit
[ https://issues.apache.org/jira/browse/HBASE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh resolved HBASE-5101. --- Resolution: Duplicate Assignee: Jonathan Hsieh This is a dupe of HBASE-2844, and later discussion lead to something similar to HBASE-2956. Add a max number of regions per regionserver limit -- Key: HBASE-5101 URL: https://issues.apache.org/jira/browse/HBASE-5101 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh In a testing environment, a cluster got to a state with more than 1500 regions per region server, and essentially became stuck and unavailable. We could add a limit to the number of regions that a region server can serve to prevent this from happening. This looks like it could be implemented in the core or as a coprocessor. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177386#comment-13177386 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/#review4152 --- I like this version. Minor comments below. src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/3323/#comment9377 Line should wrap after = sign. src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/3323/#comment9376 The catch block is redundant. src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java https://reviews.apache.org/r/3323/#comment9378 Javadoc for test class. src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java https://reviews.apache.org/r/3323/#comment9379 How about giving hbase.master.zksession.recover.timeout a much small value for the mini cluster so that the timeout value on line 73 can be lowered ? - Ted On 2011-12-29 20:41:26, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3323/ bq. --- bq. bq. (Updated 2011-12-29 20:41:26) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. bq. bq. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. bq. bq. bq. This addresses bug HBASE-5099. bq. https://issues.apache.org/jira/browse/HBASE-5099 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3323/diff bq. bq. bq. Testing bq. --- bq. bq. mvn -PlocalTests -Dtest=TestMaster* clean test bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5064) use surefire tests parallelization
[ https://issues.apache.org/jira/browse/HBASE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177388#comment-13177388 ] nkeywal commented on HBASE-5064: From the last 5 execution, it seems we have NumberFormatException + some random failure that I would tend to attribute to the usual flakiness (I may be wrong :-)). The FileNotFound exception was not reproduced. @ted: from the time taken by TestAdmin (twice the time on hadoop-qa, and very close to the test timeout of 900 seconds), I think that the machine is heavily loaded, or lacks some memory and swaps. That makes me think that we should default the // degree to 2 (and keep 4 on hadoop-qa). use surefire tests parallelization -- Key: HBASE-5064 URL: https://issues.apache.org/jira/browse/HBASE-5064 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5064.patch, 5064.patch, 5064.v10.patch, 5064.v11.patch, 5064.v12.patch, 5064.v13.patch, 5064.v14.patch, 5064.v14.patch, 5064.v15.patch, 5064.v16.patch, 5064.v17.patch, 5064.v18.patch, 5064.v18.patch, 5064.v19.patch, 5064.v19.patch, 5064.v19.patch, 5064.v2.patch, 5064.v3.patch, 5064.v4.patch, 5064.v5.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v6.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v7.patch, 5064.v8.patch, 5064.v8.patch, 5064.v9.patch To be tried multiple times on hadoop-qa before committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177394#comment-13177394 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- bq. On 2011-12-29 20:55:31, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1463 bq. https://reviews.apache.org/r/3323/diff/4/?file=65969#file65969line1463 bq. bq. The catch block is redundant. That's right. I forgot it :) bq. On 2011-12-29 20:55:31, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1453 bq. https://reviews.apache.org/r/3323/diff/4/?file=65969#file65969line1453 bq. bq. Line should wrap after = sign. too long? sure. bq. On 2011-12-29 20:55:31, Ted Yu wrote: bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java, line 41 bq. https://reviews.apache.org/r/3323/diff/4/?file=65970#file65970line41 bq. bq. How about giving hbase.master.zksession.recover.timeout a much small value for the mini cluster so that the timeout value on line 73 can be lowered ? sure - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/#review4152 --- On 2011-12-29 20:41:26, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3323/ bq. --- bq. bq. (Updated 2011-12-29 20:41:26) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. bq. bq. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. bq. bq. bq. This addresses bug HBASE-5099. bq. https://issues.apache.org/jira/browse/HBASE-5099 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3323/diff bq. bq. bq. Testing bq. --- bq. bq. mvn -PlocalTests -Dtest=TestMaster* clean test bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177396#comment-13177396 ] Jonathan Hsieh commented on HBASE-5103: --- should this get marked as 0.92.1 instead of 0.92.0? Fix improper master znode deserialization - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177397#comment-13177397 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/ --- (Updated 2011-12-29 21:16:22.655431) Review request for hbase, Ted Yu and Michael Stack. Changes --- Minor changes per review comments. Summary --- Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. This addresses bug HBASE-5099. https://issues.apache.org/jira/browse/HBASE-5099 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java PRE-CREATION Diff: https://reviews.apache.org/r/3323/diff Testing --- mvn -PlocalTests -Dtest=TestMaster* clean test Thanks, Jimmy ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5099: --- Attachment: hbase-5099-v5.patch ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177399#comment-13177399 ] Hudson commented on HBASE-5103: --- Integrated in HBase-TRUNK #2590 (See [https://builds.apache.org/job/HBase-TRUNK/2590/]) HBASE-5103 Fix improper master znode deserialization (Jonathan Hsieh) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java Fix improper master znode deserialization - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5103) Fix improper master znode deserialization
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177398#comment-13177398 ] Zhihong Yu commented on HBASE-5103: --- 0.92 hasn't been released yet :-) Without HBASE-5081 and HBASE-5099 fixed, I don't think 0.92 would be ready. Fix improper master znode deserialization - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177404#comment-13177404 ] Zhihong Yu commented on HBASE-5099: --- @Jimmy: Do you mind producing a patch for 0.92 ? I got the following when I tried to apply patch v5 to 0.92: {code} 1 out of 3 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/master/HMaster.java.rej {code} Also there is no test category in 0.92: {code} [ERROR] /Users/zhihyu/92hbase/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java:[27,30] cannot find symbol [ERROR] symbol : class MediumTests [ERROR] location: package org.apache.hadoop.hbase {code} This is a whitespace at line 41 in TestMasterZKSessionRecovery - I can remove it at time of integration. Nice work. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177405#comment-13177405 ] Kannan Muthukkaruppan commented on HBASE-5104: -- Additional note: The use of ColumnPaginationFilter AND ColumnPrefixFilter for the intended use case (i.e. to get next set of 5 thread ids for tag2) probably will not work even after this bug is fixed. I think once the bug is fixed for the above data set, the program should return 0 kvs. Here's why: (select * from Tab where filter1 and filter2) should be same as: (select * from Tab where filter1) INTERSECT (select * from Tab where filter2) When separately applied, filter1 will return rows with tag1 prefix and filter2 will return rows with tag0 prefix (for Jiakai's example above) and the INTERSECTION will be the empty set. The real confusion here seems to be because of the use of a filter for pagination. This seems odd. In normal SQL for example, pagination is not part of the WHERE clause but a separate special clause (as if it was being applied on the results of a sub-query). (select * from Tab where column LIKE 'tag1% LIMIT 5 OFFSET 5) Possible ways of supporting this use case: 1) Don't use AND (via FilterList), but enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Thoughts? FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177409#comment-13177409 ] Zhihong Yu commented on HBASE-5104: --- Option #1 is easier to implement. I prefer option #2 which is really interesting feature. So we have two more JIRAs to work on :-) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177410#comment-13177410 ] Zhihong Yu commented on HBASE-5099: --- Will integrate tomorrow morning if I don't get objections. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5099: -- Attachment: 5099.92 Patch for 0.92 branch where all TestMasterXX tests passed. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177417#comment-13177417 ] Jimmy Xiang commented on HBASE-5099: @Ted, thanks! ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177421#comment-13177421 ] Zhihong Yu commented on HBASE-5099: --- @Jimmy: Can you put the patch on a real cluster and see if the 300 second timeout is good for zk session recovery ? Thanks ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5099: -- Fix Version/s: 0.94.0 0.92.0 Hadoop Flags: Reviewed ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177425#comment-13177425 ] Lars Hofhansl commented on HBASE-5104: -- It seems to me that ColumnPaginationFilter should wrap another filter (similar to how FilterList wraps other filters), and then apply the pagination on the result (the wrapper filter of course could be another filter list). FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177430#comment-13177430 ] Kannan Muthukkaruppan commented on HBASE-5104: -- Lars: I think viewing it as a general filter is problematic. Note: a FilterList wraps other filters, but it is perfectly legal for it to be wrapped by outer level filters... (say if you want to build a complex boolean expression). Whereas the pagination functionality doesn't lend itself to this flexibility. It is more like a special post-processing -only filter that you apply at the end stage. FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177431#comment-13177431 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/#review4154 --- src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/3323/#comment9383 Should we issue executor.shutdownNow() here if the worker did not finish, in order to attempt to interrupt and long running worker? - Lars On 2011-12-29 21:16:22, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3323/ bq. --- bq. bq. (Updated 2011-12-29 21:16:22) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. bq. bq. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. bq. bq. bq. This addresses bug HBASE-5099. bq. https://issues.apache.org/jira/browse/HBASE-5099 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3323/diff bq. bq. bq. Testing bq. --- bq. bq. mvn -PlocalTests -Dtest=TestMaster* clean test bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5104) FilterList doesn't work right with ColumnPaginationFilter
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Muthukkaruppan updated HBASE-5104: - Summary: FilterList doesn't work right with ColumnPaginationFilter (was: FilterList doesn't work right with filters (such as ColumPrefixFilter) which use the SEEK_NEXT_USING_HINT) FilterList doesn't work right with ColumnPaginationFilter - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177433#comment-13177433 ] Zhihong Yu commented on HBASE-5099: --- I think executor.shutdown() is appropriate here since we want the original body of tryRecoveringExpiredZKSession() to complete. There is new timeout governing the wait for shutdown to complete. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5104) FilterList doesn't work right with ColumnPaginationFilter
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177435#comment-13177435 ] Lars Hofhansl commented on HBASE-5104: -- @Kannan: Right, that's what I had in mind. You build up your filter (using whatever FilterList nesting you need) and then wrap the end result into a ColumnPaginationFilter (which would optionally take a filter). That way it can be applied at the end-stage while not requiring any API changes to Get/Scan. I see what your saying, though, it rarely makes sense to wrap a ColumnPaginationFilter into a FilterList... We could just document it as such. FilterList doesn't work right with ColumnPaginationFilter - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: testFilterList.rb Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177438#comment-13177438 ] jirapos...@reviews.apache.org commented on HBASE-5099: -- bq. On 2011-12-29 22:29:34, Lars Hofhansl wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1464 bq. https://reviews.apache.org/r/3323/diff/5/?file=65971#file65971line1464 bq. bq. Should we issue executor.shutdownNow() here if the worker did not finish, in order to attempt to interrupt and long running worker? Yes, that will be safer. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3323/#review4154 --- On 2011-12-29 21:16:22, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3323/ bq. --- bq. bq. (Updated 2011-12-29 21:16:22) bq. bq. bq. Review request for hbase, Ted Yu and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. Per discussion with Ted (on issues), I put up a patch to run tryRecoveringExpiredZKSession() in a separate thread and time it out and fail the recovery if it is stuck somewhere. bq. bq. I added a test to test the abort method. However, for the mini cluster, becomeActiveMaster() doesn't succeed so the abort method ends up always aborted. So the actually success recovery is not tested. bq. bq. bq. This addresses bug HBASE-5099. bq. https://issues.apache.org/jira/browse/HBASE-5099 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java a5935a6 bq. src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/3323/diff bq. bq. bq. Testing bq. --- bq. bq. mvn -PlocalTests -Dtest=TestMaster* clean test bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5100) Rollback of split could cause closed region to be opened again
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177437#comment-13177437 ] Hadoop QA commented on HBASE-5100: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12508876/5100.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/631//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/631//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/631//console This message is automatically generated. Rollback of split could cause closed region to be opened again -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.92.0, 0.94.0 Attachments: 5100.txt, hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING,
[jira] [Updated] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root re
[ https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5099: --- Attachment: hbase-5099-v6.patch Updated per Lars's comment. It will be safer. ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root region is on Key: HBASE-5099 URL: https://issues.apache.org/jira/browse/HBASE-5099 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0, 0.94.0 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, distributed-log-splitting-hangs.png, hbase-5099-v2.patch, hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, hbase-5099-v6.patch, hbase-5099.patch A RS died. The ServerShutdownHandler kicked in and started the logspliting. SpliLogManager installed the tasks asynchronously, then started to wait for them to complete. The task znodes were not created actually. The requests were just queued. At this time, the zookeeper connection expired. HMaster tried to recover the expired ZK session. During the recovery, a new zookeeper connection was created. However, this master became the new master again. It tried to assign root and meta. Because the dead RS got the old root region, the master needs to wait for the log splitting to complete. This waiting holds the zookeeper event thread. So the async create split task is never retried since there is only one event thread, which is waiting for the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira