[jira] [Commented] (HBASE-7725) Add ability to block on a compaction request for a region
[ https://issues.apache.org/jira/browse/HBASE-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567445#comment-13567445 ] Lars Hofhansl commented on HBASE-7725: -- Lgtm Add ability to block on a compaction request for a region - Key: HBASE-7725 URL: https://issues.apache.org/jira/browse/HBASE-7725 Project: HBase Issue Type: Bug Components: Compaction, Coprocessors, regionserver Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0, 0.94.5 Attachments: example.java, hbase-7725_0.94-v0.patch You can request that a compaction be started, but you can't be sure when that compaction request completes. This is a simple update to the CompactionRequest interface and the compact-split thread on the RS that doesn't actually impact the RS exposed interface. This is particularly useful for CPs so they can control starting/running a compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94
[ https://issues.apache.org/jira/browse/HBASE-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567447#comment-13567447 ] Lars Hofhansl commented on HBASE-7607: -- We could keep the RSTracker and just remove the interrupt nonsense. That way you test in a loop whether the RS's znode was removed and then end the test. Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in 0.94 --- Key: HBASE-7607 URL: https://issues.apache.org/jira/browse/HBASE-7607 Project: HBase Issue Type: Bug Components: Client, test Affects Versions: 0.94.4 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.94.5 Attachments: HBASE-7607-v2.patch TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk and 0.94.X. The codebase is different in both. In 0.94.x, client retries to look at the root region, while the cluster is down and /hbase znode is no longer present. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. I will file a separate jira for the trunk as the code is different there. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567448#comment-13567448 ] Lars Hofhansl commented on HBASE-3996: -- I might be able to deploy this on a test cluster to try tomorrow. Support multiple tables and scanners as input to the mapper in map/reduce jobs -- Key: HBASE-3996 URL: https://issues.apache.org/jira/browse/HBASE-3996 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Eran Kutner Assignee: Bryan Baugher Priority: Critical Fix For: 0.96.0, 0.94.5 Attachments: 3996-v10.txt, 3996-v11.txt, 3996-v12.txt, 3996-v13.txt, 3996-v14.txt, 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 3996-v6.txt, 3996-v7.txt, 3996-v8.txt, 3996-v9.txt, HBase-3996.patch It seems that in many cases feeding data from multiple tables or multiple scanners on a single table can save a lot of time when running map/reduce jobs. I propose a new MultiTableInputFormat class that would allow doing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row
[ https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567450#comment-13567450 ] Lars Hofhansl commented on HBASE-5664: -- Hmm... Yes. Maybe something else slowed it down. Lemme try again. CP hooks in Scan flow for fast forward when filter filters out a row Key: HBASE-5664 URL: https://issues.apache.org/jira/browse/HBASE-5664 Project: HBase Issue Type: Improvement Components: Coprocessors, Filters Affects Versions: 0.92.1 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.96.0, 0.94.5 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, HBASE-5664_94_V3.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch In HRegion.nextInternal(int limit, String metric) We have while(true) loop so as to fetch a next result which satisfies filter condition. When Filter filters out the current fetched row we call nextRow(byte [] currentRow) before going with the next row. {code} if (results.isEmpty() || filterRow()) { // this seems like a redundant step - we already consumed the row // there're no left overs. // the reasons for calling this method are: // 1. reset the filters. // 2. provide a hook to fast forward the row (used by subclasses) nextRow(currentRow); {code} // 2. provide a hook to fast forward the row (used by subclasses) We can provide same feature of fast forward support for the CP also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7403) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567457#comment-13567457 ] Hadoop QA commented on HBASE-7403: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567312/hbase-7403-trunkv13.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestHLog Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4269//console This message is automatically generated. Online Merge Key: HBASE-7403 URL: https://issues.apache.org/jira/browse/HBASE-7403 Project: HBase Issue Type: New Feature Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0, 0.94.6 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, hbase-7403-trunkv13.patch, hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only encoded name enough 4.No limit when operation, you don't need to tabke care the events like Server Dead, Balance, Split, Disabing/Enabing table, no need to take care whether you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions Usage: 1.Tool: bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] table-name region-encodedname-1 region-encodedname-2 2.API: static void MergeManager#createMergeRequest We need merge in the following cases: 1.Region hole or region overlap, can’t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create table 4.Too many empty or small regions would reduce the system performance(e.g. mslab) Current merge tools only support offline and are not able to redo if exception is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC’s directory, move regionA’s and regionB’s file to regionC’s directory, delete regionA’s and regionB’s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we
[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567471#comment-13567471 ] nkeywal commented on HBASE-7590: bq. How about clients watching the region server's ephemeral nodes. We would also need to manage the new regionservers and the client disconnect. There could be extra cases with short lived clients that could hammer ZK. Using a separate znode allows to share a lot of code between a multicast mode and a ZK mode. Listening directly all znodes from the client would mean having just a ZK mode imho (but it could be fine). Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-7495: - Attachment: HBASE-7495-v6.txt parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567496#comment-13567496 ] Anoop Sam John commented on HBASE-7728: --- LogRoller thread trying to do a rolling over current log file. It captured the updateLock already. {code} HLog#rollWriter(boolean force) synchronized (updateLock) { // Clean up current writer. Path oldFile = cleanupCurrentWriter(currentFilenum); this.writer = nextWriter; } {code} As part of the clean up current writer, this thread try to sync the pending writes {code} HLog#cleanupCurrentWriter(){ sync(); } this.writer.close(); } {code} At the same time logSyncer thread was doing a defered log sync operation {code} HLog#syncer(long txid){ ... synchronized (flushLock) { try { logSyncerThread.hlogFlush(tempWriter, pending); } catch(IOException io) { synchronized (this.updateLock) { // HBASE-4387, HBASE-5623, retry with updateLock held tempWriter = this.writer; logSyncerThread.hlogFlush(tempWriter, pending); } } } {code} This thread trying to grab the updateLock and holding the flushLock. Same time the roller thread coming and as part of clean up sync it tries to grab flushLock. IOException might have happened in the logSyncer thread(logSyncerThread.hlogFlush). At this time our assumption is a log rollover already happened. That is why we try to write again with updateLock held and getting the writer again. [The writer on which the IOE happened should have closed.] In roller thread the writer close happens after the cleanup operation. So I guess logSyncerThread.hlogFlush thrown IOE not because of a log roll. With out assuming the log roll in catch block we can check for tempWriter == this.writer; ?? I am not an expert in this area. As per a quick code study adding my observation. If wrong pls correct me. Any logs with you when this happened? deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-7495: - Attachment: (was: HBASE-7495-v6.txt) parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-7495: - Attachment: HBASE-7495-v6.txt parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567502#comment-13567502 ] Anoop Sam John commented on HBASE-7495: --- {code} +ListCallableVoid tasks = new ArrayListCallableVoid(storeFileScannerCount); {code} Why we need this list? parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567508#comment-13567508 ] Liang Xie commented on HBASE-7495: -- Uploaded patch v6, move the MVCC setThreadReadPoint from ScannerSeekWorker constructor into call block For passing config object, seems it need to add a new param in StoreScanner's constructor, and probably need to repair many broken test cases? In my patch, it's initialized just one time in *static* block, i think it's fine as well. [~yuzhih...@gmail.com], if we move the ExecutorService to HRegionServer class, we need to expose it with *static* getter, right? since in StoreScanner class, we could not get the current HRegionServer instance easily in current codebase. but if we add a static getter method, it'll bring several FindBug warnings. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567511#comment-13567511 ] Liang Xie commented on HBASE-7495: -- Thanks my colleague [~fenghh] for the MVCC code improvement:) parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567519#comment-13567519 ] Anoop Sam John commented on HBASE-7728: --- With out assuming the log roll in catch block we can check for tempWriter == this.writer; ?? - Not correct.. Can we know the IOE because of a parallel writer close? deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567534#comment-13567534 ] Hadoop QA commented on HBASE-7495: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567321/HBASE-7495-v6.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4270//console This message is automatically generated. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567539#comment-13567539 ] Liang Xie commented on HBASE-7495: -- [~anoopsamjohn], good point, it's a trash code came from previous version, let me remove it from the v6 file now. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-7495: - Attachment: HBASE-7495-v6.txt parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567541#comment-13567541 ] Hadoop QA commented on HBASE-7495: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567324/HBASE-7495-v6.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4271//console This message is automatically generated. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-7495: - Attachment: (was: HBASE-7495-v6.txt) parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567542#comment-13567542 ] Hadoop QA commented on HBASE-7495: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567339/HBASE-7495-v6.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4272//console This message is automatically generated. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567546#comment-13567546 ] ramkrishna.s.vasudevan commented on HBASE-7728: --- Yes logs will be needed. If sync is still going on from cleanUpWriter then it means that this.writer is not null still. If this.writer is not null then the IOE also should not have happened. If sync has happened then syncedTillHere should have changed. So nice thing to analyse and debug. The updateLock is needed too while changing the writer. deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-7495: - Attachment: HBASE-7495-v7.txt parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567550#comment-13567550 ] Anoop Sam John commented on HBASE-7728: --- Yes Ram the locks are correctly used AFA I have seen Ideally logSyncerThread.hlogFlush should not throw IOE as it is clear from the thread dumb that the roller has not even near the point where it closes and reset the current writer. Still it seems the IOE happened. That is why it is asking for the updateLock. deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567553#comment-13567553 ] Anoop Sam John commented on HBASE-7728: --- [~aaronwq] Any chance for logs? deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567551#comment-13567551 ] Anoop Sam John commented on HBASE-7728: --- Sorry.. thread dump ;) deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567554#comment-13567554 ] ramkrishna.s.vasudevan commented on HBASE-7728: --- I am checking with latest 0.94 code. May be 0.94.2 has some changes as per the line no in the thread dump? deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row
[ https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567557#comment-13567557 ] Anoop Sam John commented on HBASE-5664: --- Thanks Lars. Yes the performance degrade in your test looks strange. This patch not adding up any extra lines in the normal scan path. CP hooks in Scan flow for fast forward when filter filters out a row Key: HBASE-5664 URL: https://issues.apache.org/jira/browse/HBASE-5664 Project: HBase Issue Type: Improvement Components: Coprocessors, Filters Affects Versions: 0.92.1 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.96.0, 0.94.5 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, HBASE-5664_94_V3.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch In HRegion.nextInternal(int limit, String metric) We have while(true) loop so as to fetch a next result which satisfies filter condition. When Filter filters out the current fetched row we call nextRow(byte [] currentRow) before going with the next row. {code} if (results.isEmpty() || filterRow()) { // this seems like a redundant step - we already consumed the row // there're no left overs. // the reasons for calling this method are: // 1. reset the filters. // 2. provide a hook to fast forward the row (used by subclasses) nextRow(currentRow); {code} // 2. provide a hook to fast forward the row (used by subclasses) We can provide same feature of fast forward support for the CP also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-7728: -- Fix Version/s: 0.94.5 0.96.0 deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker Fix For: 0.96.0, 0.94.5 the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7728) deadlock occurs between hlog roller and hlog syncer
[ https://issues.apache.org/jira/browse/HBASE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567570#comment-13567570 ] Anoop Sam John commented on HBASE-7728: --- I can see getting an IOE wrapping a NPE when the concurrent writer close happening This is from here {code} SequenceFileLogWriter public void append(HLog.Entry entry) throws IOException { entry.setCompressionContext(compressionContext); try { this.writer.append(entry.getKey(), entry.getEdit()); } catch (NullPointerException npe) { // Concurrent close... throw new IOException(npe); } } {code} deadlock occurs between hlog roller and hlog syncer --- Key: HBASE-7728 URL: https://issues.apache.org/jira/browse/HBASE-7728 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.2 Environment: Linux 2.6.18-164.el5 x86_64 GNU/Linux Reporter: Wang Qiang Priority: Blocker Fix For: 0.96.0, 0.94.5 the hlog roller thread and hlog syncer thread may occur dead lock with the 'flushLock' and 'updateLock', and then cause all 'IPC Server handler' thread blocked on hlog append. the jstack info is as follow : regionserver60020.logRoller: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1305) - waiting to lock 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:876) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:657) - locked 0x00067d54ace0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) regionserver60020.logSyncer: at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1314) - waiting to lock 0x00067d54ace0 (a java.lang.Object) - locked 0x00067bf88d58 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1456) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567575#comment-13567575 ] Hadoop QA commented on HBASE-7495: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567341/HBASE-7495-v7.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4273//console This message is automatically generated. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567589#comment-13567589 ] Hudson commented on HBASE-7717: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #386 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/386/]) HBASE-7717 Wait until regions are assigned in TestSplitTransactionOnCluster (Lars H and Ted Yu) (Revision 1440800) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567669#comment-13567669 ] Ted Yu commented on HBASE-7495: --- store.getHRegion() returns the region which has rsServices field. You can create a package private getter in HRegion so that RegionServerServices can be accessed. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567679#comment-13567679 ] Ted Yu commented on HBASE-7495: --- Can you make TestCoprocessorScanPolicy parameterized test so that hbase.storescanner.parallel.seek.enable being true can be exercised ? parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-7495.txt, HBASE-7495.txt, HBASE-7495.txt, HBASE-7495-v2.txt, HBASE-7495-v3.txt, HBASE-7495-v4.txt, HBASE-7495-v4.txt, HBASE-7495-v5.txt, HBASE-7495-v6.txt, HBASE-7495-v7.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal reassigned HBASE-7590: -- Assignee: nkeywal Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567735#comment-13567735 ] nkeywal commented on HBASE-7590: Actually, one of the issue is that in the client code, we don't really manage the server name. We use the hostname the port, but we don't use directly the start code... There is sequence number, but I need to find out if it matches the start code. Despite this, I have something working for the server side, and the client receives the status. The point is to put properly the checks in the client (and this is unrelated to the communication protocol :-) Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567738#comment-13567738 ] Jonathan Hsieh commented on HBASE-7711: --- I'm +1 for this patch modulo the nit below, but not convinced this solves all the problems. The other exception trace adds row name to exn. Please add this detail to exception message? {code} try { if (!existingLatch.await(this.rowLockWaitDuration, TimeUnit.MILLISECONDS)) { throw new IOException(Timed out on getting lock for row= + Bytes.toStringBinary(row)); } } catch (InterruptedException ie) { // Empty } {code} rowlock release problem with thread interruptions in batchMutate Key: HBASE-7711 URL: https://issues.apache.org/jira/browse/HBASE-7711 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: 0.96.0 Attachments: 7711.txt, 7711-v2.txt An earlier version of snapshots would thread interrupt operations. In longer term testing we ran into an exception stack trace that indicated that a rowlock was taken an never released. {code} 2013-01-26 01:54:56,417 ERROR org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign exception to subprocedure pe-1 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign E xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock in batch put, row=0001558252 java.io.IOException: Timed out on getting lock for row=0001558252 at org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511) at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) .. every snapshot attempt that used this region for the next two days encountered this problem. {code} Snapshots will now bypass this problem with the fix in HBASE-7703. However, we should make sure hbase regionserver operations are safe when interrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7711: -- Attachment: 7711-v3.txt Patch v3 addresses Jon's comment. rowlock release problem with thread interruptions in batchMutate Key: HBASE-7711 URL: https://issues.apache.org/jira/browse/HBASE-7711 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: 0.96.0 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt An earlier version of snapshots would thread interrupt operations. In longer term testing we ran into an exception stack trace that indicated that a rowlock was taken an never released. {code} 2013-01-26 01:54:56,417 ERROR org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign exception to subprocedure pe-1 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign E xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock in batch put, row=0001558252 java.io.IOException: Timed out on getting lock for row=0001558252 at org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511) at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) .. every snapshot attempt that used this region for the next two days encountered this problem. {code} Snapshots will now bypass this problem with the fix in HBASE-7703. However, we should make sure hbase regionserver operations are safe when interrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567749#comment-13567749 ] Jonathan Hsieh commented on HBASE-7711: --- lovely. +1. rowlock release problem with thread interruptions in batchMutate Key: HBASE-7711 URL: https://issues.apache.org/jira/browse/HBASE-7711 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: 0.96.0 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt An earlier version of snapshots would thread interrupt operations. In longer term testing we ran into an exception stack trace that indicated that a rowlock was taken an never released. {code} 2013-01-26 01:54:56,417 ERROR org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign exception to subprocedure pe-1 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign E xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock in batch put, row=0001558252 java.io.IOException: Timed out on getting lock for row=0001558252 at org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511) at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) .. every snapshot attempt that used this region for the next two days encountered this problem. {code} Snapshots will now bypass this problem with the fix in HBASE-7703. However, we should make sure hbase regionserver operations are safe when interrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567782#comment-13567782 ] Lars Hofhansl commented on HBASE-7717: -- Yet another failure https://builds.apache.org/job/HBase-0.94/810/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testTableExistsIfTheSpecifiedTableRegionIsSplitParent/. It seems we always have to wait a bit for the cluster to learn what regions it has. Sigh. Will make a quick addendum. Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-7717: - Attachment: 7717-addendum-0.94.txt How's this? Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
Lars Hofhansl created HBASE-7729: Summary: TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1278) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:506) at
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567807#comment-13567807 ] Ted Yu commented on HBASE-3787: --- bq. but server failed before returning response. Client retries on the new server HBaseClient should generate a new nonce when request is sent to new server. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567811#comment-13567811 ] Ted Yu commented on HBASE-7717: --- lgtm. {code} +assertTrue(Table not online, cluster.getRegions(tableName).size() != 0); {code} Mind including tableName in the assert message ? Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567815#comment-13567815 ] Ted Yu commented on HBASE-7711: --- Integrated to trunk. Thanks for the reviews, Matteo and Jon. rowlock release problem with thread interruptions in batchMutate Key: HBASE-7711 URL: https://issues.apache.org/jira/browse/HBASE-7711 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: 0.96.0 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt An earlier version of snapshots would thread interrupt operations. In longer term testing we ran into an exception stack trace that indicated that a rowlock was taken an never released. {code} 2013-01-26 01:54:56,417 ERROR org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign exception to subprocedure pe-1 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign E xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock in batch put, row=0001558252 java.io.IOException: Timed out on getting lock for row=0001558252 at org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511) at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) .. every snapshot attempt that used this region for the next two days encountered this problem. {code} Snapshots will now bypass this problem with the fix in HBASE-7703. However, we should make sure hbase regionserver operations are safe when interrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567817#comment-13567817 ] Hadoop QA commented on HBASE-7711: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567376/7711-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4274//console This message is automatically generated. rowlock release problem with thread interruptions in batchMutate Key: HBASE-7711 URL: https://issues.apache.org/jira/browse/HBASE-7711 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: 0.96.0 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt An earlier version of snapshots would thread interrupt operations. In longer term testing we ran into an exception stack trace that indicated that a rowlock was taken an never released. {code} 2013-01-26 01:54:56,417 ERROR org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign exception to subprocedure pe-1 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign E xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at
[jira] [Updated] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7711: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) rowlock release problem with thread interruptions in batchMutate Key: HBASE-7711 URL: https://issues.apache.org/jira/browse/HBASE-7711 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: 0.96.0 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt An earlier version of snapshots would thread interrupt operations. In longer term testing we ran into an exception stack trace that indicated that a rowlock was taken an never released. {code} 2013-01-26 01:54:56,417 ERROR org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign exception to subprocedure pe-1 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign E xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock in batch put, row=0001558252 java.io.IOException: Timed out on getting lock for row=0001558252 at org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511) at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) .. every snapshot attempt that used this region for the next two days encountered this problem. {code} Snapshots will now bypass this problem with the fix in HBASE-7703. However, we should make sure hbase regionserver operations are safe when interrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567834#comment-13567834 ] Lars Hofhansl commented on HBASE-7729: -- Full test output: https://builds.apache.org/job/HBase-0.94/808/testReport/org.apache.hadoop.hbase.catalog/TestCatalogTrackerOnCluster/testBadOriginalRootLocation/ TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567851#comment-13567851 ] Lars Hofhansl commented on HBASE-7729: -- [~ghelmling], is this possible related to the client refactor? TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291) at
[jira] [Comment Edited] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567851#comment-13567851 ] Lars Hofhansl edited comment on HBASE-7729 at 1/31/13 5:32 PM: --- [~ghelmling], is this possibly related to the client refactor? was (Author: lhofhansl): [~ghelmling], is this possible related to the client refactor? TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567862#comment-13567862 ] Lars Hofhansl commented on HBASE-7729: -- From the logs it looks like that the old master's main thread has exited, but the ZK trackers in the Master's CatalogTracker's HConnection are still active and receiving events (including the fake root region). It seems we should stop the trackers upon HConnection.close(). TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567864#comment-13567864 ] Andrew Purtell commented on HBASE-3787: --- I think the above comments all taken together are a reasonable thing to try: - Introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. - nonce = hash(client address, table, row, timestamp) - HBaseClient should generate a new nonce whenever a request is sent to new server. - Server tracks nonces by (client address, nonce, timestamp) - Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Perhaps we introduce a new exception type like OperationInProgressException which inherits from DoNotRetryException so the client understands the retry operation was failed because the previous attempt is still pending server side. - We should append the nonce to the WALEdit, and recover them along with the entry data. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92
Jimmy Xiang created HBASE-7730: -- Summary: HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92 Key: HBASE-7730 URL: https://issues.apache.org/jira/browse/HBASE-7730 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.2 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.94.5 HBASE-4429 introduced synchronousBalanceSwitch to HMaster. HBaseAdmin uses this call (HBASE-5630). Therefore, hbck and hbase shell are not backward compatible with 0.92. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567864#comment-13567864 ] Andrew Purtell edited comment on HBASE-3787 at 1/31/13 5:50 PM: I think the above comments all taken together are a reasonable thing to try: - Introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. - nonce = hash(client address, table, row, timestamp) - HBaseClient should generate a new nonce whenever a new op is sent to new server. Reuse the nonce for any retry. - Server tracks nonces by (client address, nonce, timestamp) - Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Perhaps we introduce a new exception type like OperationInProgressException which inherits from DoNotRetryException so the client understands the retry operation was failed because the previous attempt is still pending server side. - We should append the nonce to the WALEdit, and recover them along with the entry data. was (Author: apurtell): I think the above comments all taken together are a reasonable thing to try: - Introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. - nonce = hash(client address, table, row, timestamp) - HBaseClient should generate a new nonce whenever a request is sent to new server. - Server tracks nonces by (client address, nonce, timestamp) - Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Perhaps we introduce a new exception type like OperationInProgressException which inherits from DoNotRetryException so the client understands the retry operation was failed because the previous attempt is still pending server side. - We should append the nonce to the WALEdit, and recover them along with the entry data. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567872#comment-13567872 ] Ted Yu commented on HBASE-3787: --- bq. We should append the nonce to the WALEdit, and recover them along with the entry data. Is the above needed ? Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567873#comment-13567873 ] Gary Helmling commented on HBASE-7729: -- [~lhofhansl] Certainly possible that the behavior here changed as a result of the client refactor, though I don't recall seeing the ZK trackers involved in the changed code paths. But maybe the refactor subtly changed some of the previous flow. I'll take a look. TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at
[jira] [Comment Edited] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567864#comment-13567864 ] Andrew Purtell edited comment on HBASE-3787 at 1/31/13 5:52 PM: I think the above comments all taken together are a reasonable thing to try: - Introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. - nonce = hash(client address, table, row, timestamp) - HBaseClient should generate a new nonce whenever a new op is sent to new server. Reuse the nonce for any retry. - Server tracks nonces by (client address, nonce, timestamp). Expire entries after some grace period. Restart the expiration timer whenever the nonce is checked as part of op processing. Lazily clean up expired entries either as part of add/remove or via a chore. - Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Perhaps we introduce a new exception type like OperationInProgressException which inherits from DoNotRetryException so the client understands the retry operation was failed because the previous attempt is still pending server side. - We should append the nonce to the WALEdit, and recover them along with the entry data. was (Author: apurtell): I think the above comments all taken together are a reasonable thing to try: - Introduce a nonce (generated internally by the client) on non-idempotent operations to convert them into idempotent ones. - nonce = hash(client address, table, row, timestamp) - HBaseClient should generate a new nonce whenever a new op is sent to new server. Reuse the nonce for any retry. - Server tracks nonces by (client address, nonce, timestamp) - Add the entry when op processing starts, remove it when finished or failed, refuse to process an op twice by sending back a DoNotRetryException. Perhaps we introduce a new exception type like OperationInProgressException which inherits from DoNotRetryException so the client understands the retry operation was failed because the previous attempt is still pending server side. - We should append the nonce to the WALEdit, and recover them along with the entry data. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567874#comment-13567874 ] Andrew Purtell commented on HBASE-3787: --- bq. Is the above needed ? I think Enis is right. A server accepts an op, it goes down mid flight, another takes over and is processing WAL entries, the client retries and is relocated to the new server, without having a nonce the increment would be accepted twice. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567877#comment-13567877 ] Ted Yu commented on HBASE-3787: --- bq. without having a nonce the increment would be accepted twice But there is this assumption: bq. HBaseClient should generate a new nonce whenever a new op is sent to new server Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567882#comment-13567882 ] Sergey Shelukhin commented on HBASE-7701: - The server B crashed after putting the info into meta, before updating ZK. In fact we are in SSH for server B... so it should not be expected Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.AssignmentManager(1596): Assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. to 10.11.2.92,50661,1358906192942 ... (50661 didn't update ZK to OPEN, only OPENING) 2013-01-22 17:59:06,605 INFO [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are already in transition) 2013-01-22 17:59:06,605 DEBUG [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(219): Skip assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. because it has been opened in 10.11.2.92,51231,1358906285048 {code} Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567883#comment-13567883 ] Sergey Shelukhin commented on HBASE-7701: - Not assigning region because it's on A is clearly not correct... why would master think it's on A when A already sent CloseRegionHandler long ago? Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.AssignmentManager(1596): Assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. to 10.11.2.92,50661,1358906192942 ... (50661 didn't update ZK to OPEN, only OPENING) 2013-01-22 17:59:06,605 INFO [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are already in transition) 2013-01-22 17:59:06,605 DEBUG [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(219): Skip assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. because it has been opened in 10.11.2.92,51231,1358906285048 {code} Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567890#comment-13567890 ] Andrew Purtell commented on HBASE-3787: --- Good point Ted. So then the client should not retry an increment or append (or other nonidempotent op) if it has been relocated. See LarsH's comment at the top of this issue, sorry I missed it. And follows the rest of your comment is valid too. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7731) Append/Increment methods in HRegion doesn't check whether the table is readonly or not
Devaraj Das created HBASE-7731: -- Summary: Append/Increment methods in HRegion doesn't check whether the table is readonly or not Key: HBASE-7731 URL: https://issues.apache.org/jira/browse/HBASE-7731 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das I bumped into this one - All the mutation calls like Put, Delete check whether the region in question is readonly. The append and increment calls don't. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567897#comment-13567897 ] Andrew Purtell commented on HBASE-3787: --- What we really need as a different model for interaction. A bidirectional event stream between clients and servers. Clients issue requests. Servers (any server) acknowledges completion. Implies an async client. In the absence of that we can at least give the client an indication the op has been processed even through a retry as long as the region doesn't move. (Add to my OperationInProgressException also OperationAlreadyCompletedException.) If the region relocates, then we expose some uncertainty to the application by failing any additional retries. This will be less surprising than current behavior because we won't have silent application of the same op more than once, but punts to the app which isn't great either. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567896#comment-13567896 ] Ted Yu commented on HBASE-3787: --- If client retries on region move, that would allow skipping the append of nonce to the WALEdit. I think that would reduce the complexity of the implementation. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92
[ https://issues.apache.org/jira/browse/HBASE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567905#comment-13567905 ] Jonathan Hsieh commented on HBASE-7730: --- I'd think just changing the calls in hbck in 0.94 to use the now deprecated method that is in 0.92 (and commenting why we didn't change it) would be sufficient. Would we expect the cross version shell accesses? HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92 --- Key: HBASE-7730 URL: https://issues.apache.org/jira/browse/HBASE-7730 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.2 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.94.5 HBASE-4429 introduced synchronousBalanceSwitch to HMaster. HBaseAdmin uses this call (HBASE-5630). Therefore, hbck and hbase shell are not backward compatible with 0.92. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7701: --- Attachment: trunk-7701_v1.patch First version: https://reviews.apache.org/r/9200/ Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml, trunk-7701_v1.patch Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.AssignmentManager(1596): Assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. to 10.11.2.92,50661,1358906192942 ... (50661 didn't update ZK to OPEN, only OPENING) 2013-01-22 17:59:06,605 INFO [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are already in transition) 2013-01-22 17:59:06,605 DEBUG [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(219): Skip assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. because it has been opened in 10.11.2.92,51231,1358906285048 {code} Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7701: --- Status: Patch Available (was: Open) Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml, trunk-7701_v1.patch Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.AssignmentManager(1596): Assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. to 10.11.2.92,50661,1358906192942 ... (50661 didn't update ZK to OPEN, only OPENING) 2013-01-22 17:59:06,605 INFO [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are already in transition) 2013-01-22 17:59:06,605 DEBUG [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(219): Skip assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. because it has been opened in 10.11.2.92,51231,1358906285048 {code} Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92
[ https://issues.apache.org/jira/browse/HBASE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567927#comment-13567927 ] Jimmy Xiang commented on HBASE-7730: Good point. I was wondering if it will be used somewhere else in the future. I already have a simple fix, just need to make sure it works. HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92 --- Key: HBASE-7730 URL: https://issues.apache.org/jira/browse/HBASE-7730 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.2 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.94.5 HBASE-4429 introduced synchronousBalanceSwitch to HMaster. HBaseAdmin uses this call (HBASE-5630). Therefore, hbck and hbase shell are not backward compatible with 0.92. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567930#comment-13567930 ] Andrew Purtell commented on HBASE-3787: --- [~ted_yu] From the client's point of view, it is still retrying the op even though the server handling the region has changed. So - HBaseClient should generate a new nonce for each a new op. Reuse the nonce for any retry. Therefore if nonces are persisted to the WAL and recovered from it, the server will still do the right thing. Your concern is implementation complexity on the server. I think it is valid, but do you think this outweighs the application level uncertainty that would happen if a request fails because of a region relocation? Would the app know if the op applied or not? Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567930#comment-13567930 ] Andrew Purtell edited comment on HBASE-3787 at 1/31/13 6:48 PM: [~ted_yu] From the client's point of view, it is still retrying the op even though the server handling the region has changed. So - HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry. Therefore if nonces are persisted to the WAL and recovered from it, the server will still do the right thing. Your concern is implementation complexity on the server. I think it is valid, but do you think this outweighs the application level uncertainty that would happen if a request fails because of a region relocation? Would the app know if the op applied or not? was (Author: apurtell): [~ted_yu] From the client's point of view, it is still retrying the op even though the server handling the region has changed. So - HBaseClient should generate a new nonce for each a new op. Reuse the nonce for any retry. Therefore if nonces are persisted to the WAL and recovered from it, the server will still do the right thing. Your concern is implementation complexity on the server. I think it is valid, but do you think this outweighs the application level uncertainty that would happen if a request fails because of a region relocation? Would the app know if the op applied or not? Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567944#comment-13567944 ] Ted Yu commented on HBASE-3787: --- For statement #1: bq. HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry. Agreed. bq. this outweighs the application level uncertainty that would happen if a request fails because of a region relocation? I think statement #1 already achieves what persistence to WAL would achieve. bq. Would the app know if the op applied or not? The app would know when the response for operation is not IOException. Thanks Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567953#comment-13567953 ] Lars Hofhansl commented on HBASE-7729: -- Looking more I doubt it. I seem to recall having seen this before, too. I ran the test in a loop for an hour, didn't fail, so that could just be a weird issue on the jenkins machines. TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567967#comment-13567967 ] Lars Hofhansl commented on HBASE-7717: -- Should be clear from the call stack, though. Anyway, I'll add the table name to the assertion message. Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567974#comment-13567974 ] Ted Yu commented on HBASE-7717: --- +1 Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567990#comment-13567990 ] Lars Hofhansl commented on HBASE-7729: -- Failed here too: https://builds.apache.org/job/HBase-0.94/778/testReport/junit/org.apache.hadoop.hbase.catalog/TestCatalogTrackerOnCluster/testBadOriginalRootLocation/ So not related to the client refactor. TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13567997#comment-13567997 ] Lars Hofhansl commented on HBASE-7717: -- Committed. I very sincerely hope this is the last I'll ever see of this test. Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568008#comment-13568008 ] ramkrishna.s.vasudevan commented on HBASE-7701: --- @Jimmy Went thro the logs and also the patch. This is again HBASE-6060 or HBASE-7521. The region is still opening in an RS but that RS goes down before completing the transition. Jimmy your fix seems fine to me. Just a small comment on review board. Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml, trunk-7701_v1.patch Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.AssignmentManager(1596): Assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. to 10.11.2.92,50661,1358906192942 ... (50661 didn't update ZK to OPEN, only OPENING) 2013-01-22 17:59:06,605 INFO [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are already in transition) 2013-01-22 17:59:06,605 DEBUG [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(219): Skip assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. because it has been opened in 10.11.2.92,51231,1358906285048 {code} Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7730) HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92
[ https://issues.apache.org/jira/browse/HBASE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568011#comment-13568011 ] Lars Hofhansl commented on HBASE-7730: -- We've said in the past that 0.92 and 0.94 should be fully backward and forward compatible. That would imply supporting cross version shell access. HBaseAdmin#synchronousBalanceSwitch is not compatible with 0.92 --- Key: HBASE-7730 URL: https://issues.apache.org/jira/browse/HBASE-7730 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.2 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.94.5 HBASE-4429 introduced synchronousBalanceSwitch to HMaster. HBaseAdmin uses this call (HBASE-5630). Therefore, hbck and hbase shell are not backward compatible with 0.92. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568010#comment-13568010 ] Hadoop QA commented on HBASE-7701: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567403/trunk-7701_v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestHLog org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4275//console This message is automatically generated. Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml, trunk-7701_v1.patch Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568015#comment-13568015 ] Lars Hofhansl commented on HBASE-7729: -- The trackers are not threads unto themselves. So another theory is that the reference count of the connection used in the CatalogTracker did not go to 0 and hence the Connection's ZKW is never stopped. TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568017#comment-13568017 ] ramkrishna.s.vasudevan commented on HBASE-7717: --- Yes Lars. Thanks a lot for your patience on this. I try to spend time on these failures but don't find time. May be next time you can just ping me to take a look at it. And i will do my part if it fails next time :( Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
[ https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568019#comment-13568019 ] Rishit Shroff commented on HBASE-7404: -- Thanks! Any particular reason that it was used in combination with LRU Block Cache and not a replacement of it in the first use case?? Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE -- Key: HBASE-7404 URL: https://issues.apache.org/jira/browse/HBASE-7404 Project: HBase Issue Type: New Feature Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: 7404-trunk-v10.patch, 7404-trunk-v11.patch, 7404-trunk-v12.patch, 7404-trunk-v13.patch, 7404-trunk-v13.txt, 7404-trunk-v14.patch, BucketCache.pdf, hbase-7404-94v2.patch, hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch, Introduction of Bucket Cache.pdf First, thanks @neil from Fusion-IO share the source code. Usage: 1.Use bucket cache as main memory cache, configured as the following: –hbase.bucketcache.ioengine heap –hbase.bucketcache.size 0.4 (size for bucket cache, 0.4 is a percentage of max heap size) 2.Use bucket cache as a secondary cache, configured as the following: –hbase.bucketcache.ioengine file:/disk1/hbase/cache.data(The file path where to store the block data) –hbase.bucketcache.size 1024 (size for bucket cache, unit is MB, so 1024 means 1GB) –hbase.bucketcache.combinedcache.enabled false (default value being true) See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and org.apache.hadoop.hbase.io.hfile.bucket.BucketCache What's Bucket Cache? It could greatly decrease CMS and heap fragment by GC It support a large cache space for High Read Performance by using high speed disk like Fusion-io 1.An implementation of block cache like LruBlockCache 2.Self manage blocks' storage position through Bucket Allocator 3.The cached blocks could be stored in the memory or file system 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), combined with LruBlockCache to decrease CMS and fragment by GC. 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to store block) to enlarge cache space How about SlabCache? We have studied and test SlabCache first, but the result is bad, because: 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds of block size, especially using DataBlockEncoding 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , it causes CMS and heap fragment don't get any better 3.Direct heap performance is not good as heap, and maybe cause OOM, so we recommend using heap engine See more in the attachment and in the patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568022#comment-13568022 ] Andrew Purtell commented on HBASE-3787: --- How does the client know if the op failed before or after it was persisted to the WAL without a way to check? Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck
[ https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568028#comment-13568028 ] ramkrishna.s.vasudevan commented on HBASE-7698: --- Here we can set a boolean saying if transition happened to FAILED_OPEN . If openSuccess == false and the new flag is also false then we can try doing the update to FAILED_OPEN once in finally block. Any way any ZK exception while doing this FAILED_OPEN update has to be thrown out i feel. race between RS shutdown thread and openregionhandler causes region to get stuck Key: HBASE-7698 URL: https://issues.apache.org/jira/browse/HBASE-7698 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin 2013-01-22 17:59:03,237 INFO [Shutdown of org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08 ... 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): Closing IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.: disabling compactions amp; flushes 2013-01-22 17:59:03,411 DEBUG [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): Updates disabled for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:03,415 ERROR [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): Caught throwable while processing event M_RS_OPEN_REGION java.io.IOException: java.io.IOException: java.io.IOException: Filesystem closed at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never called and region can get stuck. As an added benefit, the meta is already written by that time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568043#comment-13568043 ] Ted Yu commented on HBASE-3787: --- clarification: bq. HBaseClient should generate a new nonce for each op. Reuse the nonce for any retry. Here the nonce is reused when retrying against new region server, right ? If so, we're on the same page - WALEdit needs to accommodate nonce. Increment is non-idempotent but client retries RPC -- Key: HBASE-3787 URL: https://issues.apache.org/jira/browse/HBASE-3787 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0, 0.94.4 Reporter: dhruba borthakur Priority: Critical Fix For: 0.96.0 The HTable.increment() operation is non-idempotent. The client retries the increment RPC a few times (as specified by configuration) before throwing an error to the application. This makes it possible that the same increment call be applied twice at the server. For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()? Another option would be to enhance the IPC module to make the RPC server correctly identify if the RPC is a retry attempt and handle accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568046#comment-13568046 ] Jimmy Xiang commented on HBASE-7701: @Ram, you are right. The original fix is to let timeout monitor handle it. But that's not fast enough. Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml, trunk-7701_v1.patch Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.AssignmentManager(1596): Assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. to 10.11.2.92,50661,1358906192942 ... (50661 didn't update ZK to OPEN, only OPENING) 2013-01-22 17:59:06,605 INFO [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are already in transition) 2013-01-22 17:59:06,605 DEBUG [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(219): Skip assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. because it has been opened in 10.11.2.92,51231,1358906285048 {code} Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7701) Opening regions on dead server are not reassigned quickly
[ https://issues.apache.org/jira/browse/HBASE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7701: --- Status: Open (was: Patch Available) Opening regions on dead server are not reassigned quickly - Key: HBASE-7701 URL: https://issues.apache.org/jira/browse/HBASE-7701 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Jimmy Xiang Attachments: TEST-org.apache.hadoop.hbase.IntegrationTestRebalanceAndKillServersTargeted.xml, trunk-7701_v1.patch Closed regions are not removed from assignments. I am not sure if it's a general state problem, or just a small bug; for now, one manifestation is that moved region is ignored by SSH of the target server if target server dies before updating ZK. {code} 2013-01-22 17:59:00,524 DEBUG [IPC Server handler 3 on 50658] master.AssignmentManager(1475): Sent CLOSE to 10.11.2.92,51231,1358906285048 for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. 2013-01-22 17:59:00,997 DEBUG [RS_CLOSE_REGION-10.11.2.92,51231,1358906285048-1] handler.CloseRegionHandler(167): set region closed state in zk successfully for region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. sn name: 10.11.2.92,51231,1358906285048 2013-01-22 17:59:01,088 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.RegionStates(242): Region {NAME =gt; apos;IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.apos;, STARTKEY =gt; apos;6660apos;, ENDKEY =gt; apos;732capos;, ENCODED =gt; 0200b366bc37c5afd1185f7d487c7dfb,} transitioned from {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=CLOSED, ts=1358906341087, server=null} to {IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. state=OFFLINE, ts=1358906341088, server=null} 2013-01-22 17:59:01,128 INFO [MASTER_CLOSE_REGION-10.11.2.92,50658,1358906192673-0] master.AssignmentManager(1596): Assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. to 10.11.2.92,50661,1358906192942 ... (50661 didn't update ZK to OPEN, only OPENING) 2013-01-22 17:59:06,605 INFO [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(202): Reassigning 7 region(s) that 10.11.2.92,50661,1358906192942 was carrying (skipping 0 regions(s) that are already in transition) 2013-01-22 17:59:06,605 DEBUG [MASTER_SERVER_OPERATIONS-10.11.2.92,50658,1358906192673-2] handler.ServerShutdownHandler(219): Skip assigning region IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb. because it has been opened in 10.11.2.92,51231,1358906285048 {code} Note the server in the last line - the one that has long closed the region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7711) rowlock release problem with thread interruptions in batchMutate
[ https://issues.apache.org/jira/browse/HBASE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568063#comment-13568063 ] Hudson commented on HBASE-7711: --- Integrated in HBase-TRUNK #3833 (See [https://builds.apache.org/job/HBase-TRUNK/3833/]) HBASE-7711 rowlock release problem with thread interruptions in batchMutate (Ted Yu) (Revision 1441066) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java rowlock release problem with thread interruptions in batchMutate Key: HBASE-7711 URL: https://issues.apache.org/jira/browse/HBASE-7711 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Assignee: Ted Yu Fix For: 0.96.0 Attachments: 7711.txt, 7711-v2.txt, 7711-v3.txt An earlier version of snapshots would thread interrupt operations. In longer term testing we ran into an exception stack trace that indicated that a rowlock was taken an never released. {code} 2013-01-26 01:54:56,417 ERROR org.apache.hadoop.hbase.procedure.ProcedureMember: Propagating foreign exception to subprocedure pe-1 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@1cea3151:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign E xception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:321) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:150) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$200(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:112) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1359194035004, End:1359194095004, diff:6, max:6 ms at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:71) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2013-01-26 01:54:56,648 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock in batch put, row=0001558252 java.io.IOException: Timed out on getting lock for row=0001558252 at org.apache.hadoop.hbase.regionserver.HRegion.internalObtainRowLock(HRegion.java:3239) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:3315) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2150) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2021) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3511) at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) .. every snapshot attempt that used this region for the next two days encountered this problem. {code} Snapshots will now bypass this problem with the fix in HBASE-7703. However, we should make sure hbase regionserver operations are safe when interrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568066#comment-13568066 ] Lars Hofhansl commented on HBASE-7729: -- OK... Since I cannot reproduce locally at all... We can close, or I could opportunistically add a 1s wait between cluster shutdown and the subsequent restart of the cluster to give the ZKWs a chance to settle their affairs. (It looks like this might be unique scenario with an HMaster stopping and restarting in the same JVM) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at
[jira] [Assigned] (HBASE-7723) Remove NN URI from ZK splitlogs.
[ https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha reassigned HBASE-7723: -- Assignee: Himanshu Vashishtha Remove NN URI from ZK splitlogs. Key: HBASE-7723 URL: https://issues.apache.org/jira/browse/HBASE-7723 Project: HBase Issue Type: Bug Components: hadoop2, master Affects Versions: 0.92.0 Reporter: Kevin Odell Assignee: Himanshu Vashishtha When moving to HDFS HA or removing HA we end up changing the NN namespace. This can cause the HMaster not to start up fully due to trying to split phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: Wrong FS: error messages. The HLogs in question might not even be on HDFS anymore. You have to go in a manually clear out the ZK splitlogs directory to get HBase to properly boot up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7729) TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568090#comment-13568090 ] Lars Hofhansl commented on HBASE-7729: -- Looks like it just happened again in the latest build. TestCatalogTrackerOnCluster.testbadOriginalRootLocation fails occasionally -- Key: HBASE-7729 URL: https://issues.apache.org/jira/browse/HBASE-7729 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Failure: {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:223) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:86) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:77) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:650) at org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation(TestCatalogTrackerOnCluster.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:206) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:420) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:216) ... 32 more {code} Likely caused by this: {code} 2013-01-31 04:52:23,064 FATAL [Master:0;hemera.apache.org,52696,1359607882775] master.HMaster(1493): Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: example.org/192.0.43.10:1234 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:425) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) at $Proxy19.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:138) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291) at
[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.
[ https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568091#comment-13568091 ] Himanshu Vashishtha commented on HBASE-7723: a patch which removes storing the NN uri in split log znodes. It keeps the .logs znode while creating the znode in the SplitLogManager, and SplitLogWorker re-creates the path using the hbase.rootdir to the actual log. Remove NN URI from ZK splitlogs. Key: HBASE-7723 URL: https://issues.apache.org/jira/browse/HBASE-7723 Project: HBase Issue Type: Bug Components: hadoop2, master Affects Versions: 0.92.0 Reporter: Kevin Odell Assignee: Himanshu Vashishtha When moving to HDFS HA or removing HA we end up changing the NN namespace. This can cause the HMaster not to start up fully due to trying to split phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: Wrong FS: error messages. The HLogs in question might not even be on HDFS anymore. You have to go in a manually clear out the ZK splitlogs directory to get HBase to properly boot up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7723) Remove NN URI from ZK splitlogs.
[ https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-7723: --- Attachment: HBASE-7723-94.patch Remove NN URI from ZK splitlogs. Key: HBASE-7723 URL: https://issues.apache.org/jira/browse/HBASE-7723 Project: HBase Issue Type: Bug Components: hadoop2, master Affects Versions: 0.92.0 Reporter: Kevin Odell Assignee: Himanshu Vashishtha Attachments: HBASE-7723-94.patch When moving to HDFS HA or removing HA we end up changing the NN namespace. This can cause the HMaster not to start up fully due to trying to split phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: Wrong FS: error messages. The HLogs in question might not even be on HDFS anymore. You have to go in a manually clear out the ZK splitlogs directory to get HBase to properly boot up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568096#comment-13568096 ] Hudson commented on HBASE-7717: --- Integrated in HBase-0.94 #812 (See [https://builds.apache.org/job/HBase-0.94/812/]) HBASE-7717 addendum, really wait for all tables in TestSplitTransactionOnCluster. (Revision 1441151) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.
[ https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568097#comment-13568097 ] Himanshu Vashishtha commented on HBASE-7723: I tested this with a clean zk slate. This is so because it removes the NN uri, otherwise, old znodes will point to non-existent log files. If this approaches sounds good, I will do a similar change for replication znodes handling too. Remove NN URI from ZK splitlogs. Key: HBASE-7723 URL: https://issues.apache.org/jira/browse/HBASE-7723 Project: HBase Issue Type: Bug Components: hadoop2, master Affects Versions: 0.92.0 Reporter: Kevin Odell Assignee: Himanshu Vashishtha Attachments: HBASE-7723-94.patch When moving to HDFS HA or removing HA we end up changing the NN namespace. This can cause the HMaster not to start up fully due to trying to split phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: Wrong FS: error messages. The HLogs in question might not even be on HDFS anymore. You have to go in a manually clear out the ZK splitlogs directory to get HBase to properly boot up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.
[ https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568114#comment-13568114 ] Jimmy Xiang commented on HBASE-7723: How do you handle compatibility/migration issue? Remove NN URI from ZK splitlogs. Key: HBASE-7723 URL: https://issues.apache.org/jira/browse/HBASE-7723 Project: HBase Issue Type: Bug Components: hadoop2, master Affects Versions: 0.92.0 Reporter: Kevin Odell Assignee: Himanshu Vashishtha Attachments: HBASE-7723-94.patch When moving to HDFS HA or removing HA we end up changing the NN namespace. This can cause the HMaster not to start up fully due to trying to split phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: Wrong FS: error messages. The HLogs in question might not even be on HDFS anymore. You have to go in a manually clear out the ZK splitlogs directory to get HBase to properly boot up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7723) Remove NN URI from ZK splitlogs.
[ https://issues.apache.org/jira/browse/HBASE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568121#comment-13568121 ] Ted Yu commented on HBASE-7723: --- I got the following test failure: testDelayedDeleteOnFailure(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) Time elapsed: 25.702 sec ERROR! java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232) at java.util.concurrent.FutureTask.get(FutureTask.java:91) at org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testDelayedDeleteOnFailure(TestDistributedLogSplitting.java:316) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1931) at java.lang.String.substring(String.java:1904) at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:260) at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:228) at org.apache.hadoop.hbase.master.TestDistributedLogSplitting$2.run(TestDistributedLogSplitting.java:297) Remove NN URI from ZK splitlogs. Key: HBASE-7723 URL: https://issues.apache.org/jira/browse/HBASE-7723 Project: HBase Issue Type: Bug Components: hadoop2, master Affects Versions: 0.92.0 Reporter: Kevin Odell Assignee: Himanshu Vashishtha Attachments: HBASE-7723-94.patch When moving to HDFS HA or removing HA we end up changing the NN namespace. This can cause the HMaster not to start up fully due to trying to split phantom HLogs pointing to the wrong FS - java.lang.IllegalArgumentException: Wrong FS: error messages. The HLogs in question might not even be on HDFS anymore. You have to go in a manually clear out the ZK splitlogs directory to get HBase to properly boot up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7717) Wait until regions are assigned in TestSplitTransactionOnCluster
[ https://issues.apache.org/jira/browse/HBASE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13568126#comment-13568126 ] Hudson commented on HBASE-7717: --- Integrated in HBase-TRUNK #3834 (See [https://builds.apache.org/job/HBase-TRUNK/3834/]) HBASE-7717 addendum, really wait for all tables in TestSplitTransactionOnCluster. (Revision 1441150) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java Wait until regions are assigned in TestSplitTransactionOnCluster Key: HBASE-7717 URL: https://issues.apache.org/jira/browse/HBASE-7717 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7717-0.94-combined.txt, 7717-0.94.txt, 7717-0.94-v1.txt, 7717-0.94-v2.txt, 7717-0.94-v3.txt, 7717-0.96.txt, 7717-addendum-0.94.txt, 7717-addendum-0.94-v2.txt, 7717-addendum-0.96.txt, 7717-alternate-94.txt, 7717-alternate-trunk.txt, 7717-trunk-v2.txt, 7717-trunk-v3.txt, TEST-org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.xml I've seen various failures where a table is created in the tests and then all regions are retrieved from the cluster, where the number of returned regions is 0, because the region have not been assigned, yet, or the AM does not know about them, yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7733) Fix flaky TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare
Jonathan Hsieh created HBASE-7733: - Summary: Fix flaky TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare Key: HBASE-7733 URL: https://issues.apache.org/jira/browse/HBASE-7733 Project: HBase Issue Type: Sub-task Reporter: Jonathan Hsieh Sometimes this test fails with this error message: {code} Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:205) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:205) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:228) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:228) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:183) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-7733) Fix flaky TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare
[ https://issues.apache.org/jira/browse/HBASE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-7733: - Assignee: Jonathan Hsieh Fix flaky TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare --- Key: HBASE-7733 URL: https://issues.apache.org/jira/browse/HBASE-7733 Project: HBase Issue Type: Sub-task Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Sometimes this test fails with this error message: {code} Wanted but not invoked: procedure.sendGlobalBarrierComplete(); - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:344) However, there were other interactions with this mock: - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:306) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.testMultiCohortWithMemberTimeoutDuringPrepare(TestZKProcedure.java:311) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:205) - at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:205) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:228) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.memberAcquiredBarrier(ProcedureCoordinator.java:228) - at org.apache.hadoop.hbase.procedure.ProcedureCoordinator.abortProcedure(ProcedureCoordinator.java:183) - at org.apache.hadoop.hbase.procedure.TestZKProcedure.waitAndVerifyProc(TestZKProcedure.java:337) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira