[jira] [Commented] (HBASE-7958) Statistics per-column family per-region
[ https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624358#comment-13624358 ] Elliott Clark commented on HBASE-7958: -- bq.Yeah, they are certainly pretty, but IMO pretty useless for anyone but the most novice user. I think we've been pretty remiss to ignore the novice user. If those graphs could be the thing that gets users to use hbase then they did their job. Statistics per-column family per-region --- Key: HBASE-7958 URL: https://issues.apache.org/jira/browse/HBASE-7958 Project: HBase Issue Type: New Feature Affects Versions: 0.95.2 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.95.1 Attachments: hbase-7958_rough-cut-v0.patch, hbase-7958-v0-parent.patch, hbase-7958-v0.patch Originating from this discussion on the dev list: http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain Essentially, we should have built-in statistics gathering for HBase tables. This allows clients to have a better understanding of the distribution of keys within a table and a given region. We could also surface this information via the UI. There are a couple different proposals from the email, the overview is this: We add in something on compactions that gathers stats about the keys that are written and then we surface them to a table. The possible proposals include: *How to implement it?* # Coprocessors - ** advantage - it easily plugs in and people could pretty easily add their own statistics. ** disadvantage - UI elements would also require this, we get into dependent loading, which leads down the OSGi path. Also, these CPs need to be installed _after_ all the other CPs on compaction to ensure they see exactly what gets written (doable, but a pain) # Built into HBase as a custom scanner ** advantage - always goes in the right place and no need to muck about with loading CPs etc. ** disadvantage - less pluggable, at least for the initial cut *Where do we store data?* # .META. ** advantage - its an existing table, so we can jam it into another CF there ** disadvantage - this would make META much larger, possibly leading to splits AND will make it much harder for other processes to read the info # A new stats table ** advantage - cleanly separates out the information from META ** disadvantage - should use a 'system table' idea to prevent accidental deletion, manipulation by arbitrary clients, but still allow clients to read it. Once we have this framework, we can then move to an actual implementation of various statistics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8273) HColumnDescriptor setters should return void
[ https://issues.apache.org/jira/browse/HBASE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624412#comment-13624412 ] Ted Yu commented on HBASE-8273: --- I logged this JIRA with an attempt to solve the binary compatibility issue. After further investigation, it turns out binary compatibility can be achieved between 0.92.3 and 0.94.7 in the following way: 1. introduce an interface, called X e.g., in 0.92(.3) with the setters for HColumnDescriptor 2. introduce an interface, called Y e.g., in 0.94(.7) with the setters for HColumnDescriptor where build pattern is supported 3. write Java assembly code in 0.94(.7) for HColumnDescriptor supporting both interfaces Client code does not need reflection to use those methods. HColumnDescriptor setters should return void Key: HBASE-8273 URL: https://issues.apache.org/jira/browse/HBASE-8273 Project: HBase Issue Type: Bug Reporter: Ted Yu See discussion on dev@ mailing list, entitled 'Does compatibility between versions also mean binary compatibility?' Synopsis from that thread: HBASE-5357 Use builder pattern in HColumnDescriptor changed the method signatures by changing void to HColumnDescriptor so it' not the same methods anymore. if you invoke setters on HColumnDescriptor as you'll get: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (HBASE-7870) Delete(byte [] row, long timestamp) constructor is deprecate and should not.
[ https://issues.apache.org/jira/browse/HBASE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl closed HBASE-7870. Delete(byte [] row, long timestamp) constructor is deprecate and should not. Key: HBASE-7870 URL: https://issues.apache.org/jira/browse/HBASE-7870 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Critical Fix For: 0.94.6 Attachments: HBASE-7870-v0-0.94.patch Most probably a cutpast issue. Patch is coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8240) CompoundConfiguration should implement Iterable
[ https://issues.apache.org/jira/browse/HBASE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624477#comment-13624477 ] Ted Yu commented on HBASE-8240: --- [~lhofhansl]: Are you Okay with this going into 0.94 ? Thanks CompoundConfiguration should implement Iterable --- Key: HBASE-8240 URL: https://issues.apache.org/jira/browse/HBASE-8240 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.95.1, 0.98.0 Attachments: 8240-v1.txt, 8240-v2.txt, 8240-v3.txt, 8240-v4.txt, 8240-v6.txt, 8240-v7.txt Here is from hadoop Configuration class: {code} public class Configuration implements IterableMap.EntryString,String, {code} There're 3 addXX() methods for CompoundConfiguration: {code} public CompoundConfiguration add(final Configuration conf) { public CompoundConfiguration addWritableMap( final MapImmutableBytesWritable, ImmutableBytesWritable map) { public CompoundConfiguration addStringMap(final MapString, String map) { {code} Parameters to these methods all support iteration. We can enhance ImmutableConfigMap with the following new method: {code} public abstract java.util.Iterator iterator(); {code} Then the following method of CompoundConfiguration can be implemented: {code} public IteratorMap.EntryString, String iterator() { {code} This enhancement would be useful in scenario where a mutable Configuration is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
Varun Sharma created HBASE-8285: --- Summary: HBaseClient never recovers for single HTable.get() calls when regions move Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.7 Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Sharma updated HBASE-8285: Attachment: 8285-0.94.txt HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.7 Attachments: 8285-0.94.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Sharma updated HBASE-8285: Attachment: (was: 8285-0.94.txt) HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Sharma updated HBASE-8285: Attachment: 8285-trunk.txt HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Sharma updated HBASE-8285: Fix Version/s: 0.98.0 Status: Patch Available (was: Open) HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Sharma updated HBASE-8285: Attachment: 8285-0.94.txt HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624494#comment-13624494 ] Hadoop QA commented on HBASE-8285: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577392/8285-0.94.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5162//console This message is automatically generated. HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624495#comment-13624495 ] Ted Yu commented on HBASE-8285: --- Nice analysis, Varun. {code} public void clearCaches(final ServerName serverName){ ... if (serverName.equals(e.getValue().getServerName())) { tableLocations.remove(e.getKey()); {code} We clear all entries on the server (serverName) regardless of region. location contains region information. When NotServingRegionException happens, should we just clear entries for that region ? BTW 0.94 patch was more recent than trunk patch. Hadoop QA is down. When it comes back up, 0.94 patch would be picked up. HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8285: -- Fix Version/s: 0.95.1 HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624499#comment-13624499 ] Varun Sharma commented on HBASE-8285: - I don't see an API to clear a specific region here. We might consider adding one... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnection.html I make this mistake of attaching in the wrong order all the time. Should I remove attachements and reattach ? Varun HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624501#comment-13624501 ] Ted Yu commented on HBASE-8285: --- I think we should add another API - this should be feasible for 0.95 / trunk. For 0.94, please run test suite based on your 0.94 patch yourself. bq. Should I remove attachements and reattach ? Please attach new version of trunk patch. HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624502#comment-13624502 ] Varun Sharma commented on HBASE-8285: - Actually, I think should use relocateRegion() API, lemme update this... HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7801) Allow a deferred sync option per Mutation.
[ https://issues.apache.org/jira/browse/HBASE-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624505#comment-13624505 ] Hadoop QA commented on HBASE-7801: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577359/7801-0.96-v6.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 156 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.protobuf.TestProtobufUtil Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5163//console This message is automatically generated. Allow a deferred sync option per Mutation. -- Key: HBASE-7801 URL: https://issues.apache.org/jira/browse/HBASE-7801 Project: HBase Issue Type: Sub-task Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 7801-0.94-v1.txt, 7801-0.94-v2.txt, 7801-0.94-v3.txt, 7801-0.96-full-v2.txt, 7801-0.96-full-v3.txt, 7801-0.96-full-v4.txt, 7801-0.96-full-v5.txt, 7801-0.96-v1.txt, 7801-0.96-v6.txt Won't have time for parent. But a deferred sync option on a per operation basis comes up quite frequently. In 0.96 this can be handled cleanly via protobufs and 0.94 we can have a special mutation attribute. For batch operation we'd take the safest sync option of any of the mutations. I.e. if there is at least one that wants to be flushed we'd sync the batch, if there's none of those but at least one that wants deferred flush we defer flush the batch, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8240) CompoundConfiguration should implement Iterable
[ https://issues.apache.org/jira/browse/HBASE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624528#comment-13624528 ] stack commented on HBASE-8240: -- -1 for 0.94. It is not even needed in 0.95 it seems. CompoundConfiguration should implement Iterable --- Key: HBASE-8240 URL: https://issues.apache.org/jira/browse/HBASE-8240 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.95.1, 0.98.0 Attachments: 8240-v1.txt, 8240-v2.txt, 8240-v3.txt, 8240-v4.txt, 8240-v6.txt, 8240-v7.txt Here is from hadoop Configuration class: {code} public class Configuration implements IterableMap.EntryString,String, {code} There're 3 addXX() methods for CompoundConfiguration: {code} public CompoundConfiguration add(final Configuration conf) { public CompoundConfiguration addWritableMap( final MapImmutableBytesWritable, ImmutableBytesWritable map) { public CompoundConfiguration addStringMap(final MapString, String map) { {code} Parameters to these methods all support iteration. We can enhance ImmutableConfigMap with the following new method: {code} public abstract java.util.Iterator iterator(); {code} Then the following method of CompoundConfiguration can be implemented: {code} public IteratorMap.EntryString, String iterator() { {code} This enhancement would be useful in scenario where a mutable Configuration is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8286) Move site back up out of hbase-assembly; bad idea
stack created HBASE-8286: Summary: Move site back up out of hbase-assembly; bad idea Key: HBASE-8286 URL: https://issues.apache.org/jira/browse/HBASE-8286 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Redoing the packaging, I thought it smart putting it all under the new hbase-assembly module. A few weeks later, it is not looking like a good idea (Enis suggested moving the site was maybe 'odd'). Here are a few issues: + The generated reports will have mention of hbase-assembly because that is where they are being generated (Elliott pointed out the the scm links mention hbase-assembly instead of trunk). + Doug Meil wants to build and deploy the site but currently deploy is a bit awkward to explain because site presumes it is top level so staging goes to the wrong place and you have to come along and do fixup afterward; it shouldn't be so hard. A possible downside is that we will break Nicks site check added to hadoopqa because site is an aggregating build plugin... but lets see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8187) trunk/0.95 tarball packaging
[ https://issues.apache.org/jira/browse/HBASE-8187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8187: - Release Note: To build a tgz that has source only: $ mvn clean install -DskipTests -Dassembly.file=src/main/assembly/src.xml assembly:single The product can be found in hbase-assembly/target To build a binary tgz: $ mvn clean install -DskipTests javadoc:aggregate site assembly:single For hadoop2.0: $ mvn clean install -DskipTests -Dhadoop.profile=2.0 javadoc:aggregate site assembly:single See HBASE-8224 for how to add hadoop1 and hadoop2 labels to the jars. Again the product will be in hbase-assembly/target was: To build a tgz that has source only: $ mvn clean install -DskipTests -Dassembly.file=src/assembly/src.xml assembly:single The product can be found in hbase-assembly/target To build a binary tgz: $ mvn clean install -DskipTests javadoc:aggregate site assembly:single For hadoop2.0: $ mvn clean install -DskipTests -Dhadoop.profile=2.0 javadoc:aggregate site assembly:single See HBASE-8224 for how to add hadoop1 and hadoop2 labels to the jars. Again the product will be in hbase-assembly/target trunk/0.95 tarball packaging Key: HBASE-8187 URL: https://issues.apache.org/jira/browse/HBASE-8187 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.95.0, 0.98.0 Attachments: 095tarballs.sh, 8187.txt, 8187v2.txt, 8187v3.txt, 8187v5.095.addendum.txt, 8187v5.095.txt, 8187v5.txt Packaging needs work now we have maven multi-moduled. Sounds like folks want a package for hadoop1 and hadoop2. Also want source package and maybe even a package that includes test jars so can run integration tests. Let this be umbrella issue for package fixes. Will hang smaller issues off this one as we figure them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled
[ https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624543#comment-13624543 ] Hudson commented on HBASE-8208: --- Integrated in hbase-0.95 #130 (See [https://builds.apache.org/job/hbase-0.95/130/]) HBASE-8208 In some situations data is not replicated to slaves when deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465175) Result = SUCCESS larsh : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java In some situations data is not replicated to slaves when deferredLogSync is enabled --- Key: HBASE-8208 URL: https://issues.apache.org/jira/browse/HBASE-8208 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.98.0, 0.94.6 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, hbase-8208-v1.patch, hbase-8208_v2.patch This is a subtle issue. When deferredLogSync is enabled, there are chances we could flush data before syncing all HLog entries. Assuming we just flush the internal cache and the server dies with some unsynced hlog entries. Data is not lost at the source cluster while replication is based on WAL files and some changes we flushed at the source won't be replicated the slave clusters. Although enabling deferredLogSync with tolerances of data loss, it breaks the replication assumption that whatever persisted in the source should be replicated to its slave clusters. In short, the slave cluster could end up with double losses: the data loss in the source and some data stored in source cluster may not be replicated to slaves either. The fix of the issue isn't hard. Basically we can invoke sync during each flush when replication is enabled for a region server. Since sync returns immediately when nothing to sync so there should be no performance impact. Please let me know what you think! Thanks, -Jeffrey -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled
[ https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624549#comment-13624549 ] Hudson commented on HBASE-8208: --- Integrated in HBase-TRUNK #4022 (See [https://builds.apache.org/job/HBase-TRUNK/4022/]) HBASE-8208 In some situations data is not replicated to slaves when deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465176) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java In some situations data is not replicated to slaves when deferredLogSync is enabled --- Key: HBASE-8208 URL: https://issues.apache.org/jira/browse/HBASE-8208 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.98.0, 0.94.6 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, hbase-8208-v1.patch, hbase-8208_v2.patch This is a subtle issue. When deferredLogSync is enabled, there are chances we could flush data before syncing all HLog entries. Assuming we just flush the internal cache and the server dies with some unsynced hlog entries. Data is not lost at the source cluster while replication is based on WAL files and some changes we flushed at the source won't be replicated the slave clusters. Although enabling deferredLogSync with tolerances of data loss, it breaks the replication assumption that whatever persisted in the source should be replicated to its slave clusters. In short, the slave cluster could end up with double losses: the data loss in the source and some data stored in source cluster may not be replicated to slaves either. The fix of the issue isn't hard. Basically we can invoke sync during each flush when replication is enabled for a region server. Since sync returns immediately when nothing to sync so there should be no performance impact. Please let me know what you think! Thanks, -Jeffrey -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled
[ https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624562#comment-13624562 ] Hudson commented on HBASE-8208: --- Integrated in HBase-0.94 #949 (See [https://builds.apache.org/job/HBase-0.94/949/]) HBASE-8208 In some situations data is not replicated to slaves when deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465174) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java In some situations data is not replicated to slaves when deferredLogSync is enabled --- Key: HBASE-8208 URL: https://issues.apache.org/jira/browse/HBASE-8208 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.98.0, 0.94.6 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, hbase-8208-v1.patch, hbase-8208_v2.patch This is a subtle issue. When deferredLogSync is enabled, there are chances we could flush data before syncing all HLog entries. Assuming we just flush the internal cache and the server dies with some unsynced hlog entries. Data is not lost at the source cluster while replication is based on WAL files and some changes we flushed at the source won't be replicated the slave clusters. Although enabling deferredLogSync with tolerances of data loss, it breaks the replication assumption that whatever persisted in the source should be replicated to its slave clusters. In short, the slave cluster could end up with double losses: the data loss in the source and some data stored in source cluster may not be replicated to slaves either. The fix of the issue isn't hard. Basically we can invoke sync during each flush when replication is enabled for a region server. Since sync returns immediately when nothing to sync so there should be no performance impact. Please let me know what you think! Thanks, -Jeffrey -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8276) Backport hbase-6738 to 0.94 Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624563#comment-13624563 ] Hudson commented on HBASE-8276: --- Integrated in HBase-0.94 #949 (See [https://builds.apache.org/job/HBase-0.94/949/]) HBASE-8276 Backport hbase-6738 to 0.94 Too aggressive task resubmission from the distributed log manager (Jeffrey) (Revision 1465161) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java Backport hbase-6738 to 0.94 Too aggressive task resubmission from the distributed log manager --- Key: HBASE-8276 URL: https://issues.apache.org/jira/browse/HBASE-8276 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.7 Attachments: hbase-8276.patch, hbase-8276-v1.patch In recent tests, we found situations that when some data nodes are down and file operations are slow depending on underlying hdfs timeout(normally 30 secs and socket connection timeout maybe around 1 min). While split log task heart beat time out is only 25 secs, a split log task will be preempted by SplitLogManager and assign to someone else after the 25 secs. On a small cluster, you'll see the same task is keeping bounced back force for a while. I pasted a snippet of related logs below. You can search preempted from to see a task is preempted. {code} 2013-04-01 21:22:08,599 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog: hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/.logs/ip-10-137-20-188.ec2.internal,60020,1364849530779-splitting/ip-10-137-20-188.ec2.internal%2C60020%2C1364849530779.1364865506159, length=127639653 2013-04-01 21:22:08,599 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Recovering file hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/.logs/ip-10-137-20-188.ec2.internal,60020,1364849530779-splitting/ip-10-137-20-188.ec2.internal%2C60020%2C1364849530779.1364865506159 2013-04-01 21:22:09,603 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: Finished lease recover attempt for hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/.logs/ip-10-137-20-188.ec2.internal,60020,1364849530779-splitting/ip-10-137-20-188.ec2.internal%2C60020%2C1364849530779.1364865506159 2013-04-01 21:22:09,629 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/73387f8d327a45bacf069bd631d70b3b/recovered.edits/3703447.temp, length=0 2013-04-01 21:22:09,629 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/b749cbceaaf037c97f70cc2a6f48f2b8/recovered.edits/3703446.temp, length=0 2013-04-01 21:22:09,630 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/c26b9d4a042d42c1194a8c2f389d33c8/recovered.edits/3703448.temp, length=0 2013-04-01 21:22:09,666 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/adabdb40ccd52140f09f953ff41fd829/recovered.edits/3703451.temp, length=0 2013-04-01 21:22:09,722 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://ip-10-137-16-140.ec2.internal:8020/apps/hbase/data/IntegrationTestLoadAndVerify/19f463fe74f4365e7df3e5fdb13aecad/recovered.edits/3703468.temp, length=0 2013-04-01 21:22:09,734 WARN org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting
[jira] [Commented] (HBASE-7961) truncate on disabled table should throw TableNotEnabledException.
[ https://issues.apache.org/jira/browse/HBASE-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624564#comment-13624564 ] Hudson commented on HBASE-7961: --- Integrated in HBase-0.94 #949 (See [https://builds.apache.org/job/HBase-0.94/949/]) HBASE-7961 truncate on disabled table should throw TableNotEnabledException. (rajeshbabu) (Revision 1465186) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/ruby/hbase/admin.rb truncate on disabled table should throw TableNotEnabledException. - Key: HBASE-7961 URL: https://issues.apache.org/jira/browse/HBASE-7961 Project: HBase Issue Type: Bug Components: Admin Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: HBASE-7961_94.patch, HBASE-7961.patch presently truncate on disabled table is deleting existing table and recreating(ENABLED) disable(table_name) call in truncate returing if table is disabled without nofifying to user. {code} def disable(table_name) tableExists(table_name) return if disabled?(table_name) @admin.disableTable(table_name) end {code} one more thing is we are calling tableExists in disable(table_name) as well as drop(table_name) which is un necessary. Any way below HTable object creation will check whether table exists or not. {code} h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name) {code} We can change it to {code} h_table = org.apache.hadoop.hbase.client.HTable.new(conf, table_name) table_description = h_table.getTableDescriptor() yield 'Disabling table...' if block_given? @admin.disableTable(table_name) yield 'Dropping table...' if block_given? @admin.deleteTable(table_name) yield 'Creating table...' if block_given? @admin.createTable(table_description) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624565#comment-13624565 ] Hudson commented on HBASE-6738: --- Integrated in HBase-0.94 #949 (See [https://builds.apache.org/job/HBase-0.94/949/]) HBASE-8276 Backport hbase-6738 to 0.94 Too aggressive task resubmission from the distributed log manager (Jeffrey) (Revision 1465161) Result = FAILURE Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.95.2 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Critical Fix For: 0.95.0 Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return false; } } } {code} This tyically shallows the interrupt. There are other variation, but this one seems to be the standard. Even if we fix this in HBase, we need the other layers to be Interrupteble as well. That's not proven. 2) 25s is very aggressive, considering that
[jira] [Updated] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-7824: - Release Note: Master now won't wait for existing or previously left log splitting work(except for ROOT and META region servers) finish before it can start up. Sometimes log splitting work could take more than minuets. In normal case, master node starts much faster than before when there is region server recovery work going on. The feature can be disabled by set configuration value hbase.master.wait.for.log.splitting to true.(default value is false) Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled
[ https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624566#comment-13624566 ] Hudson commented on HBASE-8208: --- Integrated in hbase-0.95-on-hadoop2 #58 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/58/]) HBASE-8208 In some situations data is not replicated to slaves when deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465175) Result = FAILURE larsh : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java In some situations data is not replicated to slaves when deferredLogSync is enabled --- Key: HBASE-8208 URL: https://issues.apache.org/jira/browse/HBASE-8208 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.98.0, 0.94.6 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, hbase-8208-v1.patch, hbase-8208_v2.patch This is a subtle issue. When deferredLogSync is enabled, there are chances we could flush data before syncing all HLog entries. Assuming we just flush the internal cache and the server dies with some unsynced hlog entries. Data is not lost at the source cluster while replication is based on WAL files and some changes we flushed at the source won't be replicated the slave clusters. Although enabling deferredLogSync with tolerances of data loss, it breaks the replication assumption that whatever persisted in the source should be replicated to its slave clusters. In short, the slave cluster could end up with double losses: the data loss in the source and some data stored in source cluster may not be replicated to slaves either. The fix of the issue isn't hard. Basically we can invoke sync during each flush when replication is enabled for a region server. Since sync returns immediately when nothing to sync so there should be no performance impact. Please let me know what you think! Thanks, -Jeffrey -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624568#comment-13624568 ] Jeffrey Zhong commented on HBASE-7824: -- I have run the whole test suite with the patch 4 times in a row. Three are clean as following and one with single failure happened in recent builds as well. {code} Results : Tests run: 1339, Failures: 0, Errors: 0, Skipped: 13 Integration Test: IntegrationTestDataIngestWithChaosMonkey Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 905.014 sec {code} I also provided a release note in the JIRA. Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-7824: - Release Note: Master now won't wait for existing or previously left log splitting work(except for ROOT and META region servers) finish before it can start up. Sometimes log splitting work could take more than minutes. In normal case, master node starts much faster than before when there is region server recovery work going on. The feature can be disabled by set configuration value hbase.master.wait.for.log.splitting to true.(default value is false) (was: Master now won't wait for existing or previously left log splitting work(except for ROOT and META region servers) finish before it can start up. Sometimes log splitting work could take more than minuets. In normal case, master node starts much faster than before when there is region server recovery work going on. The feature can be disabled by set configuration value hbase.master.wait.for.log.splitting to true.(default value is false)) Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-7824: - Release Note: Master now won't wait for existing or previously left log splitting work(except for ROOT and META region servers) finish before it can start up. Sometimes log splitting work could take more than minutes. In normal case, master node starts much faster than before when there is region server recovery work going on. The feature can be disabled by setting configuration value hbase.master.wait.for.log.splitting to true.(default value is false) (was: Master now won't wait for existing or previously left log splitting work(except for ROOT and META region servers) finish before it can start up. Sometimes log splitting work could take more than minutes. In normal case, master node starts much faster than before when there is region server recovery work going on. The feature can be disabled by set configuration value hbase.master.wait.for.log.splitting to true.(default value is false)) Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624572#comment-13624572 ] chunhui shen commented on HBASE-7824: - [~jeffreyz] IMO, it is still able to cause META data loss as I mentioned in HBASE-8251: 1.Assign ROOT to the RS where META on 2.Enable SSH for ROOT 3.Assign META If the META RS(it is also the ROOT RS) is dead between step2 and step3, MetaSSH start splitting its hlog. However step3 will assign META directly(Because HMaster#splitLogAndExpireIfOnline will return null), it means META will loss the data from hlog. Correct me if wrong, thanks Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8208) In some situations data is not replicated to slaves when deferredLogSync is enabled
[ https://issues.apache.org/jira/browse/HBASE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624576#comment-13624576 ] Hudson commented on HBASE-8208: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #480 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/480/]) HBASE-8208 In some situations data is not replicated to slaves when deferredLogSync is enabled (Jeffrey Zhong) (Revision 1465176) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java In some situations data is not replicated to slaves when deferredLogSync is enabled --- Key: HBASE-8208 URL: https://issues.apache.org/jira/browse/HBASE-8208 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.98.0, 0.94.6 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: hbase-8208-0.94.patch, hbase-8208.patch, hbase-8208-v1.patch, hbase-8208_v2.patch This is a subtle issue. When deferredLogSync is enabled, there are chances we could flush data before syncing all HLog entries. Assuming we just flush the internal cache and the server dies with some unsynced hlog entries. Data is not lost at the source cluster while replication is based on WAL files and some changes we flushed at the source won't be replicated the slave clusters. Although enabling deferredLogSync with tolerances of data loss, it breaks the replication assumption that whatever persisted in the source should be replicated to its slave clusters. In short, the slave cluster could end up with double losses: the data loss in the source and some data stored in source cluster may not be replicated to slaves either. The fix of the issue isn't hard. Basically we can invoke sync during each flush when replication is enabled for a region server. Since sync returns immediately when nothing to sync so there should be no performance impact. Please let me know what you think! Thanks, -Jeffrey -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8219) Align Offline Merge with Online Merge
[ https://issues.apache.org/jira/browse/HBASE-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624579#comment-13624579 ] chunhui shen commented on HBASE-8219: - [~mbertozzi] What do you think about the latest patch ? Align Offline Merge with Online Merge - Key: HBASE-8219 URL: https://issues.apache.org/jira/browse/HBASE-8219 Project: HBase Issue Type: Task Components: regionserver Affects Versions: 0.95.0 Reporter: Matteo Bertozzi Assignee: chunhui shen Attachments: hbase-8219v1.patch, hbase-8219v2.patch, hbase-8219v3.patch After HBASE-7403 we now have two different tools for online and offline merge, and the result produced by the two are different. (the online one works with snapshots, the offline not) We should remove the offline one, or align it to the online code. Most of the offline code in HRegion.merge() can be replaced with the one in RegionMergeTransaction, used by the online version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624583#comment-13624583 ] Jeffrey Zhong commented on HBASE-7824: -- [~zjushch] The patch already covered the case you mentioned. You can check both v7 v8 patch. The reason I don't mention the scenario in the above suggestion is to make the idea easier to be accepted. Yesterday I replied you on hbase-8251 for ROOT META collocating on one RS scenario. You can check details at MetaSSH in the patch. Basically we only recover ROOT portion and leave META part till master meta assignment completes. Below is related pseudo code snippet, please let me know if you have more questions. Thanks. {code} ... re-assign root ... if(!this.services.isServerShutdownHandlerEnabled()) { // resubmit in case we're in master initialization and SSH hasn't been enabled yet. this.services.getExecutorService().submit(this); this.deadServers.add(serverName); return; } ... re-assign meta ... {code} Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624587#comment-13624587 ] chunhui shen commented on HBASE-7824: - I think your patch couldn't fix the problem. As the above mentioned case, I have two question. 1.What will the master initialization thread do when assigning META 2.What is the value of isCarryingMeta in ServerManager#expireServer, I think it's false rather than true Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8240) CompoundConfiguration should implement Iterable
[ https://issues.apache.org/jira/browse/HBASE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624588#comment-13624588 ] Lars Hofhansl commented on HBASE-8240: -- Patch is simple enough, I am fine on that front. But I, too, do not understand what this is needed for. CompoundConfiguration should implement Iterable --- Key: HBASE-8240 URL: https://issues.apache.org/jira/browse/HBASE-8240 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.95.1, 0.98.0 Attachments: 8240-v1.txt, 8240-v2.txt, 8240-v3.txt, 8240-v4.txt, 8240-v6.txt, 8240-v7.txt Here is from hadoop Configuration class: {code} public class Configuration implements IterableMap.EntryString,String, {code} There're 3 addXX() methods for CompoundConfiguration: {code} public CompoundConfiguration add(final Configuration conf) { public CompoundConfiguration addWritableMap( final MapImmutableBytesWritable, ImmutableBytesWritable map) { public CompoundConfiguration addStringMap(final MapString, String map) { {code} Parameters to these methods all support iteration. We can enhance ImmutableConfigMap with the following new method: {code} public abstract java.util.Iterator iterator(); {code} Then the following method of CompoundConfiguration can be implemented: {code} public IteratorMap.EntryString, String iterator() { {code} This enhancement would be useful in scenario where a mutable Configuration is required. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624590#comment-13624590 ] Lars Hofhansl commented on HBASE-8285: -- Wow, really?! That seems so core to HBase that is almost unbelievable that this would have gone undetected for so long! HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624596#comment-13624596 ] Jeffrey Zhong commented on HBASE-7824: -- Let me add more clarifications to your first comments to see if you agree firstly: {quote} If the META RS(it is also the ROOT RS) is dead between step2 and step3, MetaSSH start splitting its hlog. {quote} The root recovery portion in Meta SSH will complete log splitting for both ROOT and META regions. Therefore, all recovered edits files are created before Meta region can be assigned because meta region can only be assigned only when ROOT is online. By then, all recovered edits files are created and they will be replayed when meta region is opened during assignment. {quote} However step3 will assign META directly(Because HMaster#splitLogAndExpireIfOnline will return null), it means META will loss the data from hlog. {quote} This is all right because the existing log splitting work has already be done and will be replayed during META region open phase. Thanks for your feedbacks. Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7937) Retry log rolling to support HA NN scenario
[ https://issues.apache.org/jira/browse/HBASE-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624597#comment-13624597 ] Liang Xie commented on HBASE-7937: -- Hi [~himan...@cloudera.com], i can make a new patch per comments if you are busy. I am working logrolling retry in our internal codebase, so it's handy for me. Retry log rolling to support HA NN scenario --- Key: HBASE-7937 URL: https://issues.apache.org/jira/browse/HBASE-7937 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.5 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.95.1 Attachments: HBASE-7937-trunk.patch, HBASE-7937-v1.patch A failure in log rolling causes regionserver abort. In case of HA NN, it will be good if there is a retry mechanism to roll the logs. A corresponding jira for MemStore retries is HBASE-7507. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table
[ https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624599#comment-13624599 ] chunhui shen commented on HBASE-6870: - bq.How come useCache being true gets translated into includeEndKey being true ? The useCache flag is not related with includeEndKey, includeEndKey is always true in HTable#coprocessorService. bq.I wonder if we should provide a mechanism for coprocessors to specify whether fresh .META. scan is wanted. Making a useCache flag is used for performance comparision in my initial purpose, I think there's no necessary to provide a choice for fresh .META. scan since it is still able to get a invalid region location from the .META. HTable#coprocessorExec always scan the whole table --- Key: HBASE-6870 URL: https://issues.apache.org/jira/browse/HBASE-6870 Project: HBase Issue Type: Improvement Components: Coprocessors Affects Versions: 0.94.1, 0.95.0, 0.95.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.95.1, 0.98.0 Attachments: 6870-v4.txt, HBASE-6870.patch, HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, hbase-6870v5.patch In current logic, HTable#coprocessorExec always scan the whole table, its efficiency is low and will affect the Regionserver carrying .META. under large coprocessorExec requests -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6870) HTable#coprocessorExec always scan the whole table
[ https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6870: Attachment: hbase-6870v5.patch Path v5 fixes the warnings HTable#coprocessorExec always scan the whole table --- Key: HBASE-6870 URL: https://issues.apache.org/jira/browse/HBASE-6870 Project: HBase Issue Type: Improvement Components: Coprocessors Affects Versions: 0.94.1, 0.95.0, 0.95.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.95.1, 0.98.0 Attachments: 6870-v4.txt, HBASE-6870.patch, HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, hbase-6870v5.patch In current logic, HTable#coprocessorExec always scan the whole table, its efficiency is low and will affect the Regionserver carrying .META. under large coprocessorExec requests -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table
[ https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624601#comment-13624601 ] Ted Yu commented on HBASE-6870: --- {code} +if (useCache) { + return new ArrayListbyte[](getKeysToRegionsInRange(start, end, true) + .keySet()); {code} When useCache is true, what caching mechanism would getKeysToRegionsInRange() use ? HTable#coprocessorExec always scan the whole table --- Key: HBASE-6870 URL: https://issues.apache.org/jira/browse/HBASE-6870 Project: HBase Issue Type: Improvement Components: Coprocessors Affects Versions: 0.94.1, 0.95.0, 0.95.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.95.1, 0.98.0 Attachments: 6870-v4.txt, HBASE-6870.patch, HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, hbase-6870v5.patch In current logic, HTable#coprocessorExec always scan the whole table, its efficiency is low and will affect the Regionserver carrying .META. under large coprocessorExec requests -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624602#comment-13624602 ] chunhui shen commented on HBASE-7824: - bq.Therefore, all recovered edits files are created before Meta region can be assigned because meta region can only be assigned only when ROOT is online. We can open the META region on RS when ROOT is offline. How about if ROOT RS is killed between getMetaLocationOrReadLocationFromRoot and assignMeta? Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624604#comment-13624604 ] chunhui shen commented on HBASE-7824: - It means META region is opening on the RS before SSH completed log-split Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table
[ https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624605#comment-13624605 ] chunhui shen commented on HBASE-6870: - bq.When useCache is true, what caching mechanism would getKeysToRegionsInRange() use ? We will use the cached region locations in HConnectionImplementation HTable#coprocessorExec always scan the whole table --- Key: HBASE-6870 URL: https://issues.apache.org/jira/browse/HBASE-6870 Project: HBase Issue Type: Improvement Components: Coprocessors Affects Versions: 0.94.1, 0.95.0, 0.95.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.95.1, 0.98.0 Attachments: 6870-v4.txt, HBASE-6870.patch, HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, hbase-6870v5.patch In current logic, HTable#coprocessorExec always scan the whole table, its efficiency is low and will affect the Regionserver carrying .META. under large coprocessorExec requests -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010
chunhui shen created HBASE-8287: --- Summary: TestRegionMergeTransactionOnCluster failed in trunk build #4010 Key: HBASE-8287 URL: https://issues.apache.org/jira/browse/HBASE-8287 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.95.2, 0.98.0 From the log of trunk build #4010: {code} 2013-04-04 05:45:59,396 INFO [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] handler.DispatchMergingRegionHandler(157): Cancel merging regions testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1., testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa., because can't move them together after 842ms 2013-04-04 05:45:59,396 INFO [hbase-am-zkevent-worker-pool-2-thread-1] master.AssignmentManager$4(1164): The master has opened the region testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa. that was onlin e on quirinus.apache.org,45718,1365054345790 {code} There's a small probability that fail to move merging regions together to same regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010
[ https://issues.apache.org/jira/browse/HBASE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-8287: Priority: Minor (was: Major) TestRegionMergeTransactionOnCluster failed in trunk build #4010 --- Key: HBASE-8287 URL: https://issues.apache.org/jira/browse/HBASE-8287 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Priority: Minor Fix For: 0.95.2, 0.98.0 From the log of trunk build #4010: {code} 2013-04-04 05:45:59,396 INFO [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] handler.DispatchMergingRegionHandler(157): Cancel merging regions testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1., testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa., because can't move them together after 842ms 2013-04-04 05:45:59,396 INFO [hbase-am-zkevent-worker-pool-2-thread-1] master.AssignmentManager$4(1164): The master has opened the region testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa. that was onlin e on quirinus.apache.org,45718,1365054345790 {code} There's a small probability that fail to move merging regions together to same regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624608#comment-13624608 ] Ted Yu commented on HBASE-8285: --- @Varun: If you can add a test case to show this scenario, that would be great. HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624612#comment-13624612 ] Jeffrey Zhong commented on HBASE-7824: -- {quote} We can open the META region on RS when ROOT is offline. {quote} We need updated META location in ROOT RS when META region is opening on a newly assigned RS. Is that true? Therefore, a ROOT RS has to be online for a successful META assignment. So the open will fail even if META region can be opened but no one can access it because root is offline and the old location isn't updated to the newly assigned location. Later Meta region will be re-assigned by MetaSSH. Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010
[ https://issues.apache.org/jira/browse/HBASE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-8287: Status: Patch Available (was: Open) TestRegionMergeTransactionOnCluster failed in trunk build #4010 --- Key: HBASE-8287 URL: https://issues.apache.org/jira/browse/HBASE-8287 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Priority: Minor Fix For: 0.95.2, 0.98.0 Attachments: hbase-trunk-8287.patch From the log of trunk build #4010: {code} 2013-04-04 05:45:59,396 INFO [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] handler.DispatchMergingRegionHandler(157): Cancel merging regions testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1., testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa., because can't move them together after 842ms 2013-04-04 05:45:59,396 INFO [hbase-am-zkevent-worker-pool-2-thread-1] master.AssignmentManager$4(1164): The master has opened the region testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa. that was onlin e on quirinus.apache.org,45718,1365054345790 {code} There's a small probability that fail to move merging regions together to same regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8287) TestRegionMergeTransactionOnCluster failed in trunk build #4010
[ https://issues.apache.org/jira/browse/HBASE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-8287: Attachment: hbase-trunk-8287.patch The root cause of bug: In DispatchMergingRegionHandler#process {code} region_b_location = masterServices.getAssignmentManager() .getRegionStates().getRegionServerOfRegion(region_b); onSameRS = region_a_location.equals(region_b_location); if (onSameRS || !regionStates.isRegionInTransition(region_b)) { // Regions are on the same RS, or region_b is not in // RegionInTransition any more break; } {code} The value of onSameRS will be false even if regions are already on the same regionserver as the following case: 1.getRegionServerOfRegion(region_b) 2.region_b online 3.isRegionInTransition(region_b) The above steps is synchronized by RegionStates. Step 1 return the old server of region_b, thus the value of onSameRS is false; Step 2 return false since region_b is online after step 2. Finally, the while block is break and onSameRS is false but the merging regions are on the same regionserver in fact TestRegionMergeTransactionOnCluster failed in trunk build #4010 --- Key: HBASE-8287 URL: https://issues.apache.org/jira/browse/HBASE-8287 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Priority: Minor Fix For: 0.95.2, 0.98.0 Attachments: hbase-trunk-8287.patch From the log of trunk build #4010: {code} 2013-04-04 05:45:59,396 INFO [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0] handler.DispatchMergingRegionHandler(157): Cancel merging regions testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1., testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa., because can't move them together after 842ms 2013-04-04 05:45:59,396 INFO [hbase-am-zkevent-worker-pool-2-thread-1] master.AssignmentManager$4(1164): The master has opened the region testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa. that was onlin e on quirinus.apache.org,45718,1365054345790 {code} There's a small probability that fail to move merging regions together to same regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work
[ https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624622#comment-13624622 ] chunhui shen commented on HBASE-7824: - bq.So the open will fail even if META region can be opened but no one can access it because root is offline We will retry if fail to update META location in ROOT RS. Root will be online finally, however Meta region won't be re-assigned. Improve master start up time when there is log splitting work - Key: HBASE-7824 URL: https://issues.apache.org/jira/browse/HBASE-7824 Project: HBase Issue Type: Bug Components: master Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.8 Attachments: hbase-7824.patch, hbase-7824_v2.patch, hbase-7824_v3.patch, hbase-7824-v7.patch, hbase-7824-v8.patch When there is log split work going on, master start up waits till all log split work completes even though the log split has nothing to do with meta region servers. It's a bad behavior considering a master node can run when log split is happening while its start up is blocking by log split work. Since master is kind of single point of failure, we should start it ASAP. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7801) Allow a deferred sync option per Mutation.
[ https://issues.apache.org/jira/browse/HBASE-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-7801: - Attachment: 7801-0.96-v7.txt Fixed TestProtobufUtil Allow a deferred sync option per Mutation. -- Key: HBASE-7801 URL: https://issues.apache.org/jira/browse/HBASE-7801 Project: HBase Issue Type: Sub-task Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 7801-0.94-v1.txt, 7801-0.94-v2.txt, 7801-0.94-v3.txt, 7801-0.96-full-v2.txt, 7801-0.96-full-v3.txt, 7801-0.96-full-v4.txt, 7801-0.96-full-v5.txt, 7801-0.96-v1.txt, 7801-0.96-v6.txt, 7801-0.96-v7.txt Won't have time for parent. But a deferred sync option on a per operation basis comes up quite frequently. In 0.96 this can be handled cleanly via protobufs and 0.94 we can have a special mutation attribute. For batch operation we'd take the safest sync option of any of the mutations. I.e. if there is at least one that wants to be flushed we'd sync the batch, if there's none of those but at least one that wants deferred flush we defer flush the batch, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624625#comment-13624625 ] Varun Sharma commented on HBASE-8285: - Are there mocks for the region server or HRegion interface - wondering if there is a way to write an end to end test for this ? Thanks ! HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8285) HBaseClient never recovers for single HTable.get() calls when regions move
[ https://issues.apache.org/jira/browse/HBASE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624627#comment-13624627 ] Ted Yu commented on HBASE-8285: --- Take a look at TestRegionSplitPolicy#setupMocks() for how mockRegion is set up. Also refer to MockRegionServerServices.java HBaseClient never recovers for single HTable.get() calls when regions move -- Key: HBASE-8285 URL: https://issues.apache.org/jira/browse/HBASE-8285 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.6.1 Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8285-0.94.txt, 8285-trunk.txt Steps to reproduce this bug: 1) Gracefull restart a region server causing regions to get redistributed. 2) Client call to this region keeps failing since Meta Cache is never purged on the client for the region that moved. Reason behind the bug: 1) Client continues to hit the old region server. 2) The old region server throws NotServingRegionException which is not handled correctly and the META cache entries are never purged for that server causing the client to keep hitting the old server. The reason lies in ServerCallable code since we only purge META cache entries when there is a RetriesExhaustedException, SocketTimeoutException or ConnectException. However, there is no case check for NotServingRegionException(s). Why is this not a problem for Scan(s) and Put(s) ? a) If a region server is not hosting a region/scanner, then an UnknownScannerException is thrown which causes a relocateRegion() call causing a refresh of the META cache for that particular region. b) For put(s), the processBatchCallback() interface in HConnectionManager is used which clears out META cache entries for all kinds of exceptions except DoNotRetryException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6870) HTable#coprocessorExec always scan the whole table
[ https://issues.apache.org/jira/browse/HBASE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624631#comment-13624631 ] Hadoop QA commented on HBASE-6870: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577422/hbase-6870v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.zookeeper.lock.TestZKInterProcessReadWriteLock Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5164//console This message is automatically generated. HTable#coprocessorExec always scan the whole table --- Key: HBASE-6870 URL: https://issues.apache.org/jira/browse/HBASE-6870 Project: HBase Issue Type: Improvement Components: Coprocessors Affects Versions: 0.94.1, 0.95.0, 0.95.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.95.1, 0.98.0 Attachments: 6870-v4.txt, HBASE-6870.patch, HBASE-6870-testPerformance.patch, HBASE-6870v2.patch, HBASE-6870v3.patch, hbase-6870v5.patch In current logic, HTable#coprocessorExec always scan the whole table, its efficiency is low and will affect the Regionserver carrying .META. under large coprocessorExec requests -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8286) Move site back up out of hbase-assembly; bad idea
[ https://issues.apache.org/jira/browse/HBASE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8286: - Status: Patch Available (was: Open) Move site back up out of hbase-assembly; bad idea - Key: HBASE-8286 URL: https://issues.apache.org/jira/browse/HBASE-8286 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: 8286.txt Redoing the packaging, I thought it smart putting it all under the new hbase-assembly module. A few weeks later, it is not looking like a good idea (Enis suggested moving the site was maybe 'odd'). Here are a few issues: + The generated reports will have mention of hbase-assembly because that is where they are being generated (Elliott pointed out the the scm links mention hbase-assembly instead of trunk). + Doug Meil wants to build and deploy the site but currently deploy is a bit awkward to explain because site presumes it is top level so staging goes to the wrong place and you have to come along and do fixup afterward; it shouldn't be so hard. A possible downside is that we will break Nicks site check added to hadoopqa because site is an aggregating build plugin... but lets see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8286) Move site back up out of hbase-assembly; bad idea
[ https://issues.apache.org/jira/browse/HBASE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8286: - Attachment: 8286.txt Just moves stuff around: + moves book, site, and xslt back up to src/main/ in top level module. + adjusts paths I played w/ assemblies. Still work. Move site back up out of hbase-assembly; bad idea - Key: HBASE-8286 URL: https://issues.apache.org/jira/browse/HBASE-8286 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: 8286.txt Redoing the packaging, I thought it smart putting it all under the new hbase-assembly module. A few weeks later, it is not looking like a good idea (Enis suggested moving the site was maybe 'odd'). Here are a few issues: + The generated reports will have mention of hbase-assembly because that is where they are being generated (Elliott pointed out the the scm links mention hbase-assembly instead of trunk). + Doug Meil wants to build and deploy the site but currently deploy is a bit awkward to explain because site presumes it is top level so staging goes to the wrong place and you have to come along and do fixup afterward; it shouldn't be so hard. A possible downside is that we will break Nicks site check added to hadoopqa because site is an aggregating build plugin... but lets see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6496) Example ZK based scan policy
[ https://issues.apache.org/jira/browse/HBASE-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6496: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Example ZK based scan policy Key: HBASE-6496 URL: https://issues.apache.org/jira/browse/HBASE-6496 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.2 Attachments: 6496-0.94.txt, 6496-0.96.txt, 6496-0.96-v2.txt, 6496.txt, 6496-v2.txt Provide an example of a RegionServer that listens to a ZK node to learn about what set of KVs can safely be deleted during a compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5292: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Fix For: 0.94.2 Attachments: 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, 5292-0.94.txt, ASF.LICENSE.NOT.GRANTED--D1527.1.patch, ASF.LICENSE.NOT.GRANTED--D1527.2.patch, ASF.LICENSE.NOT.GRANTED--D1527.3.patch, ASF.LICENSE.NOT.GRANTED--D1527.4.patch, ASF.LICENSE.NOT.GRANTED--D1617.1.patch, jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5549: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, Zookeeper Affects Versions: 0.95.2 Environment: all Reporter: Nicolas Liochon Assignee: Nicolas Liochon Priority: Minor Fix For: 0.94.2 Attachments: 5549_092.txt, 5549_094.txt, 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5997) Fix concerns raised in HBASE-5922 related to HalfStoreFileReader
[ https://issues.apache.org/jira/browse/HBASE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5997: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Fix concerns raised in HBASE-5922 related to HalfStoreFileReader Key: HBASE-5997 URL: https://issues.apache.org/jira/browse/HBASE-5997 Project: HBase Issue Type: Bug Affects Versions: 0.90.6, 0.92.1, 0.94.0, 0.95.2 Reporter: ramkrishna.s.vasudevan Assignee: Anoop Sam John Fix For: 0.94.2 Attachments: 5997v3_trunk.txt, 5997v3_trunk.txt, 5997v3_trunk.txt, 5997v3_trunk.txt, HBASE-5997_0.94.patch, HBASE-5997_94 V2.patch, HBASE-5997_94 V3.patch, Testcase.patch.txt Pls refer to the comment https://issues.apache.org/jira/browse/HBASE-5922?focusedCommentId=13269346page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13269346. Raised this issue to solve that comment. Just incase we don't forget it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6165: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6211) Put latencies in jmx
[ https://issues.apache.org/jira/browse/HBASE-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6211: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Put latencies in jmx Key: HBASE-6211 URL: https://issues.apache.org/jira/browse/HBASE-6211 Project: HBase Issue Type: Bug Components: metrics, monitoring Environment: RegionServerMetrics pushes latency histograms to hadoop metrics, but they are not getting into jmx. Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.94.2 Attachments: 6211_092.txt, HBASE-6211-0.patch, HBASE-6211-1.patch, HBASE-6211-2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6263) Use default mode for HBase Thrift gateway if not specified
[ https://issues.apache.org/jira/browse/HBASE-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6263: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Use default mode for HBase Thrift gateway if not specified -- Key: HBASE-6263 URL: https://issues.apache.org/jira/browse/HBASE-6263 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.94.0, 0.95.2 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Labels: noob Fix For: 0.94.2 Attachments: HBASE-6263-0.94.patch, HBASE-6263.patch The Thrift gateway should start with a default mode if one is not selected. Currently, instead we see: {noformat} Exception in thread main java.lang.AssertionError: Exactly one option out of [-hsha, -nonblocking, -threadpool, -threadedselector] has to be specified at org.apache.hadoop.hbase.thrift.ThriftServerRunner$ImplType.setServerImpl(ThriftServerRunner.java:201) at org.apache.hadoop.hbase.thrift.ThriftServer.processOptions(ThriftServer.java:169) at org.apache.hadoop.hbase.thrift.ThriftServer.doMain(ThriftServer.java:85) at org.apache.hadoop.hbase.thrift.ThriftServer.main(ThriftServer.java:192) {noformat} See also BIGTOP-648. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6299) RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6299: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.94.2 Attachments: 6299v4.txt, 6299v4.txt, 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078,
[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id
[ https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6321: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) ReplicationSource dies reading the peer's id Key: HBASE-6321 URL: https://issues.apache.org/jira/browse/HBASE-6321 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.2 Attachments: HBASE-6321-0.92.patch, HBASE-6321-0.94.patch, HBASE-6321.patch This is what I saw: {noformat} 2012-07-01 05:04:01,638 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 8 because an error occurred: Could not read peer's cluster id org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /va1-backup/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259) at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253) {noformat} The session should just be reopened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6359) KeyValue may return incorrect values after readFields()
[ https://issues.apache.org/jira/browse/HBASE-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6359: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) KeyValue may return incorrect values after readFields() --- Key: HBASE-6359 URL: https://issues.apache.org/jira/browse/HBASE-6359 Project: HBase Issue Type: Bug Reporter: Dave Revell Assignee: Dave Revell Labels: noob Fix For: 0.94.2 Attachments: HBASE-6359-trunk-v1.diff When the same KeyValue object is used multiple times for deserialization using readFields, some methods may return incorrect values. Here is a sequence of operations that will reproduce the problem: # A KeyValue is created whose key has length 10. The private field keyLength is initialized to 0. # KeyValue.getKeyLength() is called. This reads the key length 10 from the backing array and caches it in keyLength. # KeyValue.readFields() is called to deserialize a new value. The keyLength field is not cleared and keeps its value of 10, even though this value is probably incorrect. # If getKeyLength() is called, the value 10 will be returned. For example, in a reducer with IterableKeyValue, all values after the first one from the iterable are likely to return incorrect values from getKeyLength(). The solution is to clear all memoized values in KeyValue.readFields(). I'll write a patch for this soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table
[ https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6364: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table - Key: HBASE-6364 URL: https://issues.apache.org/jira/browse/HBASE-6364 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Suraj Varma Assignee: Nicolas Liochon Labels: client Fix For: 0.94.2 Attachments: 6364.94.v2.nolargetest.patch, 6364.94.v2.nolargetest.security-addendum.patch, 6364-host-serving-META.v1.patch, 6364.v11.nolargetest.patch, 6364.v1.patch, 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 6364.v7.withtests.patch, 6364.v8.withtests.patch, 6364.v9.patch, stacktrace.txt When a server host with a Region Server holding the .META. table is powered down on a live cluster, while the HBase cluster itself detects and reassigns the .META. table, connected HBase Client's take an excessively long time to detect this and re-discover the reassigned .META. Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low value (default is 20s leading to 35 minute recovery time; we were able to get acceptable results with 100ms getting a 3 minute recovery) This was found during some hardware failure testing scenarios. Test Case: 1) Apply load via client app on HBase cluster for several minutes 2) Power down the region server holding the .META. server (i.e. power off ... and keep it off) 3) Measure how long it takes for cluster to reassign META table and for client threads to re-lookup and re-orient to the lesser cluster (minus the RS and DN on that host). Observation: 1) Client threads spike up to maxThreads size ... and take over 35 mins to recover (i.e. for the thread count to go back to normal) - no client calls are serviced - they just back up on a synchronized method (see #2 below) 2) All the client app threads queue up behind the oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj After taking several thread dumps we found that the thread within this synchronized method was blocked on NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf)); The client thread that gets the synchronized lock would try to connect to the dead RS (till socket times out after 20s), retries, and then the next thread gets in and so forth in a serial manner. Workaround: --- Default ipc.socket.timeout is set to 20s. We dropped this to a low number (1000 ms, 100 ms, etc) on the client side hbase-site.xml. With this setting, the client threads recovered in a couple of minutes by failing fast and re-discovering the .META. table on a reassigned RS. Assumption: This ipc.socket.timeout is only ever used during the initial HConnection setup via the NetUtils.connect and should only ever be used when connectivity to a region server is lost and needs to be re-established. i.e it does not affect the normal RPC actiivity as this is just the connect timeout. During RS GC periods, any _new_ clients trying to connect will fail and will require .META. table re-lookups. This above timeout workaround is only for the HBase client side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6432: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) HRegionServer doesn't properly set clusterId in conf Key: HBASE-6432 URL: https://issues.apache.org/jira/browse/HBASE-6432 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.95.2 Reporter: Francis Liu Assignee: Francis Liu Fix For: 0.94.2 Attachments: HBASE-6432_94.patch, HBASE-6432.patch ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to default since getMaster() since it uses HBaseRPC to create the proxy directly and bypasses the class which retrieves and sets the correct clusterId. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize
[ https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6437: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Avoid admin.balance during master initialize Key: HBASE-6437 URL: https://issues.apache.org/jira/browse/HBASE-6437 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.94.2 Attachments: HBASE-6437_92.patch, HBASE-6437_92.patch, HBASE-6437_94.patch, HBASE-6437_94.patch, HBASE-6437_trunk_2.patch, HBASE-6437_trunk.patch, HBASE-6437_trunk.patch In HBASE-5850 many of the admin operations have been blocked till the master initializes. But the balancer is not. So this JIRA is to extend the PleaseHoldException in case of admin.balance() call before master is initialized. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies
[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6438: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies -- Key: HBASE-6438 URL: https://issues.apache.org/jira/browse/HBASE-6438 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.94.2 Attachments: 6438-0.92.txt, 6438.addendum, 6438-addendum.94, 6438-trunk_2.patch, HBASE-6438_2.patch, HBASE-6438_94_3.patch, HBASE-6438_94_4.patch, HBASE-6438_94.patch, HBASE-6438-trunk_2.patch, HBASE-6438_trunk.patch Seeing some of the recent issues in region assignment, RegionAlreadyInTransitionException is one reason after which the region assignment may or may not happen(in the sense we need to wait for the TM to assign). In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on master restart. Consider the following case, due to some reason like master restart or external assign call, we try to assign a region that is already getting opened in a RS. Now the next call to assign has already changed the state of the znode and so the current assign that is going on the RS is affected and it fails. The second assignment that started also fails getting RAITE exception. Finally both assignments not carrying on. Idea is to find whether any such RAITE exception can be retried or not. Here again we have following cases like where - The znode is yet to transitioned from OFFLINE to OPENING in RS - RS may be in the step of openRegion. - RS may be trying to transition OPENING to OPENED. - RS is yet to add to online regions in the RS side. Here in openRegion() and updateMeta() any failures we are moving the znode to FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other cases the assignment is stopped. The idea is to just add the current state of the region assignment in the RIT map in the RS side and using that info we can determine whether the assignment can be retried or not on getting an RAITE. Considering the current work going on in AM, pls do share if this is needed atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6460) hbck -repairHoles usage inconsistent with -fixHdfsOrphans
[ https://issues.apache.org/jira/browse/HBASE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6460: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) hbck -repairHoles usage inconsistent with -fixHdfsOrphans - Key: HBASE-6460 URL: https://issues.apache.org/jira/browse/HBASE-6460 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.95.2 Reporter: Jie Huang Assignee: Jie Huang Priority: Minor Fix For: 0.94.2 Attachments: hbase-6460.patch According to the hbck's help info, shortcut - -repairHoles will enable -fixHdfsOrphans as below. {noformat} -repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans {noformat} However, in the implementation, the function fsck.setFixHdfsOrphans(false); is called in -repairHoles. This is not consistent with the usage information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6471) Performance regression caused by HBASE-4054
[ https://issues.apache.org/jira/browse/HBASE-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6471: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Performance regression caused by HBASE-4054 --- Key: HBASE-6471 URL: https://issues.apache.org/jira/browse/HBASE-6471 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.92.0 Reporter: Lars George Assignee: Jimmy Xiang Priority: Critical Fix For: 0.94.2 Attachments: trunk-6471.patch, trunk-6471.patch, trunk-6471_v2.patch The patch in HBASE-4054 switches the PooledHTable to extend HTable as opposed to implement HTableInterface. Since HTable does not have an empty constructor, the patch added a call to the super() constructor, which though does trigger the ZooKeeper and META scan, causing a considerable delay. With multiple threads using the pool in parallel, the first thread is holding up all the subsequent ones, in effect it negates the whole reason we have a HTable pool. We should complete HBASE-5728, or alternatively add a protected, empty constructor the HTable. I am +1 for the former. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6478: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) TestClassLoading.testClassLoadingFromLibDirInJar occasionally fails --- Key: HBASE-6478 URL: https://issues.apache.org/jira/browse/HBASE-6478 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.94.2 Attachments: HBASE-6478-trunk.patch, HBASE-6478-trunk-v2.patch, HBASE-6478-trunk-v3.patch, HBASE-6478-trunk-v4.patch When hudson runs for HBASE-6459, it encounters a failed testcase in org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar. The link is https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/ I check the log, and find that the function waitTableAvailable will only check the meta table, when rs open the region and update the metalocation in meta, it may not be added to the onlineregions in rs. for (HRegion region: hbase.getRegionServer(0).getOnlineRegionsLocalContext()) { this Loop will ship, and found1 will be false altogether. that's why the testcase failed. So maybe we can hbave some strictly check when table is created -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6488: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Assignee: ryan rawson Fix For: 0.94.2 Attachments: HBASE-6488-trunk.txt, HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6504) Adding GC details prevents HBase from starting in non-distributed mode
[ https://issues.apache.org/jira/browse/HBASE-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6504: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Adding GC details prevents HBase from starting in non-distributed mode -- Key: HBASE-6504 URL: https://issues.apache.org/jira/browse/HBASE-6504 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Benoit Sigoure Assignee: Michael Drzal Priority: Trivial Labels: noob Fix For: 0.94.2 Attachments: HBASE-6504-output.txt, HBASE-6504.patch, HBASE-6504-v2.patch The {{conf/hbase-env.sh}} that ships with HBase contains a few commented out examples of variables that could be useful, such as adding {{-XX:+PrintGCDetails -XX:+PrintGCDateStamps}} to {{HBASE_OPTS}}. This has the annoying side effect that the JVM prints a summary of memory usage when it exits, and it does so on stdout: {code} $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed false Heap par new generation total 19136K, used 4908K [0x00073a20, 0x00073b6c, 0x00075186) eden space 17024K, 28% used [0x00073a20, 0x00073a6cb0a8, 0x00073b2a) from space 2112K, 0% used [0x00073b2a, 0x00073b2a, 0x00073b4b) to space 2112K, 0% used [0x00073b4b, 0x00073b4b, 0x00073b6c) concurrent mark-sweep generation total 63872K, used 0K [0x00075186, 0x0007556c, 0x0007f5a0) concurrent-mark-sweep perm gen total 21248K, used 6994K [0x0007f5a0, 0x0007f6ec, 0x0008) $ ./bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed /dev/null (nothing printed) {code} And this confuses {{bin/start-hbase.sh}} when it does {{distMode=`$bin/hbase --config $HBASE_CONF_DIR org.apache.hadoop.hbase.util.HBaseConfTool hbase.cluster.distributed`}}, because then the {{distMode}} variable is not just set to {{false}}, it also contains all this JVM spam. If you don't pay enough attention and realize that 3 processes are getting started (ZK, HM, RS) instead of just one (HM), then you end up with this confusing error message: {{Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.}}, which is even more puzzling because when you run {{netstat}} to see who owns that port, then you won't find any rogue process other than the one you just started. I'm wondering if the fix is not to just change the {{if [ $distMode == 'false' ]}} to a {{switch $distMode case (false*)}} type of test, to work around this annoying JVM misfeature that pollutes stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6512) Incorrect OfflineMetaRepair log class name
[ https://issues.apache.org/jira/browse/HBASE-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6512: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Incorrect OfflineMetaRepair log class name -- Key: HBASE-6512 URL: https://issues.apache.org/jira/browse/HBASE-6512 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.94.1, 0.94.2, 0.95.2 Reporter: Liang Xie Assignee: Liang Xie Fix For: 0.94.2 Attachments: HBASE-6512.diff At the beginning of OfflineMetaRepair.java, we can observe: private static final Log LOG = LogFactory.getLog(HBaseFsck.class.getName()); It would be better change to : private static final Log LOG = LogFactory.getLog(OfflineMetaRepair.class.getName()); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6514) unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram
[ https://issues.apache.org/jira/browse/HBASE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6514: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram Key: HBASE-6514 URL: https://issues.apache.org/jira/browse/HBASE-6514 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.2, 0.94.0 Environment: MacOS 10.8 Oracle JDK 1.7 Reporter: Archimedes Trajano Assignee: Elliott Clark Fix For: 0.94.2 Attachments: FrameworkTest.java, FrameworkTest.java, HBASE-6514-94-0.patch, HBASE-6514-trunk-0.patch, out.txt When trying to run a unit test that just starts up and shutdown the server the following errors occur in System.out 01:10:59,874 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram 01:10:59,874 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram 01:10:59,875 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram 01:10:59,875 ERROR MetricsUtil:116 - unknown metrics type: org.apache.hadoop.hbase.metrics.histogram.MetricsHistogram -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6516) hbck cannot detect any IOException while .tableinfo file is missing
[ https://issues.apache.org/jira/browse/HBASE-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6516: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) hbck cannot detect any IOException while .tableinfo file is missing - Key: HBASE-6516 URL: https://issues.apache.org/jira/browse/HBASE-6516 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.95.2 Reporter: Jie Huang Assignee: Jie Huang Fix For: 0.94.2 Attachments: hbase-6516-94-v5.patch, hbase-6516.patch, hbase-6516-v2.patch, hbase-6516-v3.patch, hbase-6516-v4.patch, hbase-6516-v5a.patch, hbase-6516-v5.patch HBaseFsck checks those missing .tableinfo files in loadHdfsRegionInfos() function. However, no IoException will be catched while .tableinfo is missing, since FSTableDescriptors.getTableDescriptor doesn't throw any IoException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6525) bin/replication/copy_tables_desc.rb references non-existent class
[ https://issues.apache.org/jira/browse/HBASE-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6525: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) bin/replication/copy_tables_desc.rb references non-existent class - Key: HBASE-6525 URL: https://issues.apache.org/jira/browse/HBASE-6525 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.0, 0.95.2 Reporter: David S. Wang Assignee: David S. Wang Priority: Trivial Labels: noob Fix For: 0.94.2 Attachments: HBASE-6525.patch $ hbase org.jruby.Main copy_tables_desc.rb NameError: cannot load Java class org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper get_proxy_or_package_under_package at org/jruby/javasupport/JavaUtilities.java:54 method_missing at file:/mnt/data/hbase/lib/jruby-complete-1.6.5.jar!/builtin/javasupport/java.rb:51 (root) at copy_tables_desc.rb:35 Removing the line that references the non-existent class seems to make the script work without any visible side-effects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6520) MSLab May cause the Bytes.toLong not work correctly for increment
[ https://issues.apache.org/jira/browse/HBASE-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6520: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) MSLab May cause the Bytes.toLong not work correctly for increment - Key: HBASE-6520 URL: https://issues.apache.org/jira/browse/HBASE-6520 Project: HBase Issue Type: Bug Reporter: ShiXing Assignee: ShiXing Fix For: 0.94.2 Attachments: HBASE-6520-0.94-v1.patch, HBASE-6520-trunk-v1.patch When use MemStoreLAB, the KeyValues will share the byte array allocated by the MemStoreLAB, all the KeyValues' bytes attributes are the same byte array. When use the functions such as Bytes.toLong(byte[] bytes, int offset): {code} public static long toLong(byte[] bytes, int offset) { return toLong(bytes, offset, SIZEOF_LONG); } public static long toLong(byte[] bytes, int offset, final int length) { if (length != SIZEOF_LONG || offset + length bytes.length) { throw explainWrongLengthOrOffset(bytes, offset, length, SIZEOF_LONG); } long l = 0; for(int i = offset; i offset + length; i++) { l = 8; l ^= bytes[i] 0xFF; } return l; } {code} If we do not put a long value to the KeyValue, and read it as a long value in HRegion.increment(),the check {code} offset + length bytes.length {code} will take no effects, because the bytes.length is not equal to keyLength+valueLength, indeed it is MemStoreLAB chunkSize which is default 2048 * 1024. I will paste the patch later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files
[ https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6529: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) With HFile v2, the region server will always perform an extra copy of source files -- Key: HBASE-6529 URL: https://issues.apache.org/jira/browse/HBASE-6529 Project: HBase Issue Type: Bug Components: Performance, regionserver Affects Versions: 0.94.0, 0.95.2 Reporter: Jason Dai Assignee: Jie Huang Fix For: 0.94.2 Attachments: hbase-6529.094.txt, hbase-6529.diff With HFile v2 implementation in HBase 0.94 0.96, the region server will use HFileSystem as its {color:blue}fs{color}. When it performs bulk load in Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as {color:blue}srcFs{color}, which however will be DistributedFileSystem. Consequently, it will always perform an extra copy of source files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6552) TestAcidGuarantees system test should flush more aggressively
[ https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6552: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) TestAcidGuarantees system test should flush more aggressively - Key: HBASE-6552 URL: https://issues.apache.org/jira/browse/HBASE-6552 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.92.2, 0.94.1, 0.95.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.94.2 Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding the call to util.flush(). It would be better to go through the HBaseAdmin interface to force flushes. This would unify the code path between the unit test and the system test, as well as forcing more frequent flushes, which have previously been the source of ACID guarantee problems, e.g. HBASE-2856. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6561) Gets/Puts with many columns send the RegionServer into an endless loop
[ https://issues.apache.org/jira/browse/HBASE-6561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6561: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Gets/Puts with many columns send the RegionServer into an endless loop Key: HBASE-6561 URL: https://issues.apache.org/jira/browse/HBASE-6561 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.2 Attachments: 6561-0.94.txt, 6561-0.96.txt, 6561-0.96-v2.txt, 6561-0.96-v3.txt, 6561-0.96-v4.txt, 6561-0.96-v4.txt, 6561-0.96-v5.txt This came from the mailing this: We were able to replicate this behavior in a pseudo-distributed hbase (hbase-0.94.1) environment. We wrote a test program that creates a test table MyTestTable and populates it with random rows, then it creates a row with 60,000 columns and repeatedly updates it. Each column has a 18 byte qualifier and a 50 byte value. In our tests, when we ran the program, we usually never got beyond 15 updates before it would flush for a really long time. The rows that are being updated are about 4MB each (minues any hbase metadata). It doesn't seem like it's caused by GC. I turned on gc logging, and didn't see any long pauses. This is the gc log during the flush. http://pastebin.com/vJKKXDx5 This is the regionserver log with debug on during the same flush http://pastebin.com/Fh5213mg This is the test program we wrote. http://pastebin.com/aZ0k5tx2 You should be able to just compile it, and run it against a running HBase cluster. $ java TestTable Carlos -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6565) Coprocessor exec result Map is not thread safe
[ https://issues.apache.org/jira/browse/HBASE-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6565: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Coprocessor exec result Map is not thread safe -- Key: HBASE-6565 URL: https://issues.apache.org/jira/browse/HBASE-6565 Project: HBase Issue Type: Bug Components: Client, Coprocessors Affects Versions: 0.92.2, 0.94.0, 0.95.2 Environment: hadoop1.0.2,hbase0.94,jdk1.6 Reporter: Yuan Kang Assignee: Yuan Kang Labels: coprocessors, patch Fix For: 0.94.2 Attachments: Coprocessor-result-thread unsafe-bug-fix.patch Original Estimate: 168h Remaining Estimate: 168h I develop a coprocessor program ,but found some different results in repeated tests.for example,normally,the result's size is 10.but sometimes it appears 9. I read the HTable.java code,found a TreeMap(thread-unsafe) be used in multithreading environment.It cause the bug happened -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6576) HBaseAdmin.createTable should wait until the table is enabled
[ https://issues.apache.org/jira/browse/HBASE-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6576: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) HBaseAdmin.createTable should wait until the table is enabled - Key: HBASE-6576 URL: https://issues.apache.org/jira/browse/HBASE-6576 Project: HBase Issue Type: Bug Components: Client, test Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.94.2 Attachments: HBASE-6576-92.patch, HBASE-6576-94.patch, HBASE-6576-trunk.patch The function: {code} public void createTable(final HTableDescriptor desc, byte [][] splitKeys) {code} in HBaseAdmin is synchronous and returns once all the regions of the table are online, but does not wait for the table to be enabled, which is the last step of table creation (see CreateTableHandler). This is confusing and leads to racy code because users do not realize that this is the case. For example, I saw the following test failure in 0.92 when I ran: mvn test -Dtest=org.apache.hadoop.hbase.client.TestAdmin#testEnableDisableAddColumnDeleteColumn {code} Error Message org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin at org.apache.hadoop.hbase.master.handler.DisableTableHandler.init(DisableTableHandler.java:75) at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) Stacktrace org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin at org.apache.hadoop.hbase.master.handler.DisableTableHandler.init(DisableTableHandler.java:75) at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) {code} The issue is that code will create and table and immediately disable it in order to do some testing, for example, to test an operation that only works when the table is disabled. If the table has not been enabled yet, they will get back a TableNotEnabledException. The specific test above was fixed in HBASE-5206, but other examples exist in the code, for example the following: {code} hbase org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat newtable asdf14 {code} The code in question is: {code} byte[] tname = args[1].getBytes(); HTable table = util.createTable(tname, FAMILIES); HBaseAdmin admin = new HBaseAdmin(conf); admin.disableTable(tname); {code} It would be better if createTable just waited until the table was enabled, or threw a TableNotEnabledException if it exhausted the configured number of retries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6579) Unnecessary KV order check in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6579: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Unnecessary KV order check in StoreScanner -- Key: HBASE-6579 URL: https://issues.apache.org/jira/browse/HBASE-6579 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 Attachments: 6579.txt In StoreScanner.next(ListKeyValue, int, String) I find this code: {code} // Check that the heap gives us KVs in an increasing order. if (prevKV != null comparator != null comparator.compare(prevKV, kv) 0) { throw new IOException(Key + prevKV + followed by a + smaller key + kv + in cf + store); } prevKV = kv; {code} So this checks for bugs in the HFiles or the scanner code. It needs to compare each KVs with its predecessor. This seems unnecessary now, I propose that we remove this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline
[ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6587: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Region would be assigned twice in the case of all RS offline Key: HBASE-6587 URL: https://issues.apache.org/jira/browse/HBASE-6587 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.94.2 Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline !noRSAvailable) The code is as the following: {code} if (regionState.getStamp() + timeout = now || (this.allRegionServersOffline !noRSAvailable)) { //decide on action upon timeout or, if some RSs just came back online, we can start the // the assignment actOnTimeOut(regionState); } {code} But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice. Master log for the region 277b9b6df6de2b9be1353b4fa25f4222: {code} 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1 344948174367, server=null 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13 53b4fa25f4222 with OFFLINE state 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, region=277b9b6df6de2b9be1353b4fa25f4222 // 异常的超时 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L, 1344925649429.277b9b6df6de2b9be1353b4fa25f4222. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6603) RegionMetricsStorage.incrNumericMetric is called too often
[ https://issues.apache.org/jira/browse/HBASE-6603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6603: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) RegionMetricsStorage.incrNumericMetric is called too often -- Key: HBASE-6603 URL: https://issues.apache.org/jira/browse/HBASE-6603 Project: HBase Issue Type: Bug Components: Performance Reporter: Lars Hofhansl Assignee: M. Chen Fix For: 0.94.2 Attachments: 6503-0.96.txt, 6603-0.94.txt Running an HBase scan load through the profiler revealed that RegionMetricsStorage.incrNumericMetric is called way too often. It turns out that we make this call for *each* KV in StoreScanner.next(...). Incrementing AtomicLong requires expensive memory barriers. The observation here is that StoreScanner.next(...) can maintain a simple long in its internal loop and only update the metric upon exit. Thus the AtomicLong is not updated nearly as often. That cuts about 10% runtime from scan only load (I'll quantify this better soon). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6608) Fix for HBASE-6160, META entries from daughters can be deleted before parent entries, shouldn't compare HRegionInfo's
[ https://issues.apache.org/jira/browse/HBASE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6608: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Fix for HBASE-6160, META entries from daughters can be deleted before parent entries, shouldn't compare HRegionInfo's - Key: HBASE-6608 URL: https://issues.apache.org/jira/browse/HBASE-6608 Project: HBase Issue Type: Bug Components: Client, regionserver Affects Versions: 0.92.1, 0.94.2, 0.95.2 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.2 Attachments: 6608-v2.patch, hbase-6608_v1-0.92+0.94.patch, hbase-6608_v1.patch Our nightlies discovered that the patch for HBASE-6160 did not actually fix the issue of META entries from daughters can be deleted before parent entries. Instead of reopening the HBASE-6160, it is cleaner to track it here. The original issue is: {quote} HBASE-5986 fixed and issue, where the client sees the META entry for the parent, but not the children. However, after the fix, we have seen the following issue in tests: Region A is split to - B, C Region B is split to - D, E After some time, META entry for B is deleted since it is not needed anymore, but META entry for Region A stays in META (C still refers it). In this case, the client throws RegionOfflineException for B. {quote} The problem with the fix seems to be that we keep and compare HRegionInfo's in the HashSet at CatalogJanitor.java#scan(), but HRI that are compared are not equal. {code} HashSetHRegionInfo parentNotCleaned = new HashSetHRegionInfo(); //regions whose parents are still around for (Map.EntryHRegionInfo, Result e : splitParents.entrySet()) { if (!parentNotCleaned.contains(e.getKey()) cleanParent(e.getKey(), e.getValue())) { cleaned++; } else { ... {code} In the above case, Meta row for region A will contain a serialized version of B that is not offline. However Meta row for region B will contain a serialized version of B that is offline (MetaEditor.offlineParentInMeta() does that). So the deserialized version we put to HashSet and the deserialized version we query contains() from HashSet are different in the offline field, thus HRI.equals() fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6621) Reduce calls to Bytes.toInt
[ https://issues.apache.org/jira/browse/HBASE-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6621: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Reduce calls to Bytes.toInt --- Key: HBASE-6621 URL: https://issues.apache.org/jira/browse/HBASE-6621 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 Attachments: 6621-0.96.txt, 6621-0.96-v2.txt, 6621-0.96-v3.txt, 6621-0.96-v4.txt Bytes.toInt shows up quite often in a profiler run. It turns out that one source is HFileReaderV2$ScannerV2.getKeyValue(). Notice that we call the KeyValue(byte[], int) constructor, which forces the constructor to determine its size by reading some of the header information and calculate the size. In this case, however, we already know the size (from the call to readKeyValueLen), so we could just use that. In the extreme case of 1's of columns this noticeably reduces CPU. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6647) [performance regression] appendNoSync/HBASE-4528 doesn't take deferred log flush into account
[ https://issues.apache.org/jira/browse/HBASE-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6647: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) [performance regression] appendNoSync/HBASE-4528 doesn't take deferred log flush into account - Key: HBASE-6647 URL: https://issues.apache.org/jira/browse/HBASE-6647 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.2 Attachments: HBASE-6647-0.94.patch, HBASE-6647-0.94-v2.patch, HBASE-6647-0.94-v3.patch, HBASE-6647.pach Since we upgraded to 0.94.1 from 0.92 I saw that our ICVs are about twice as slow as they were. jstack'ing I saw that most of the time we are waiting on sync()... but those tables have deferred log flush turned on so they shouldn't even be calling it. HTD.isDeferredLogFlush is currently only called in the append() methods which are pretty much not in use anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6649: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Blocker Fix For: 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-fix-io-exception-handling-1.patch, 6649-fix-io-exception-handling-1-trunk.patch, 6649-fix-io-exception-handling.patch, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6671) Kerberos authenticated super user should be able to retrieve proxied delegation tokens
[ https://issues.apache.org/jira/browse/HBASE-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6671: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Kerberos authenticated super user should be able to retrieve proxied delegation tokens -- Key: HBASE-6671 URL: https://issues.apache.org/jira/browse/HBASE-6671 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Francis Liu Assignee: Francis Liu Fix For: 0.94.2 Attachments: 6671-trunk-v2.txt, proxy_fix_94.patch, proxy_fix_94.patch, proxy_fix_trunk.patch There a services such a oozie which perform actions in behalf of the user using proxy authentication. Retrieving delegation tokens should support this behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6679) RegionServer aborts due to race between compaction and split
[ https://issues.apache.org/jira/browse/HBASE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6679: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) RegionServer aborts due to race between compaction and split Key: HBASE-6679 URL: https://issues.apache.org/jira/browse/HBASE-6679 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.94.2 Attachments: 6679-1.094.patch, 6679-1.patch, rs-crash-parallel-compact-split.log In our nightlies, we have seen RS aborts due to compaction and split racing. Original parent file gets deleted after the compaction, and hence, the daughters don't find the parent data file. The RS kills itself when this happens. Will attach a snippet of the relevant RS logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6685) Thrift DemoClient.pl got NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6685: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) Thrift DemoClient.pl got NullPointerException - Key: HBASE-6685 URL: https://issues.apache.org/jira/browse/HBASE-6685 Project: HBase Issue Type: Bug Components: Client, Thrift Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.2 Attachments: trunk-6685.patch expected error: TSocket: Could not read 4 bytes from localhost:9090 12/06/25 13:48:21 ERROR server.TThreadPoolServer: Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.getBytes(Bytes.java:765) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:591) at org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:576) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:3630) at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:3618) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6686) HFile Quarantine fails with missing dirs in hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6686: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) HFile Quarantine fails with missing dirs in hadoop 2.0 --- Key: HBASE-6686 URL: https://issues.apache.org/jira/browse/HBASE-6686 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.92.2, 0.94.2, 0.95.2 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.94.2 Attachments: hbase-6686-94-92.patch Two unit tests fail because listStatus's semantics change between hadoop 1.0 and hadoop 2.0. (specifically -- hadoop 1.0 returns empty array if used on dir that does not exist, but hadoop 2.0 throws FileNotFoundException). here's the exception: {code} 2012-08-28 16:01:19,789 WARN [Thread-3155] hbck.HFileCorruptionChecker(230): Failed to quaratine an HFile in regiondir hdfs://localhost:38096/user/jenkins/hbase/testQuarantineMissingFamdir/34b2e072b33052bf4875f85513e9c669 java.io.FileNotFoundException: File hdfs://localhost:38096/user/jenkins/hbase/testQuarantineMissingFamdir/34b2e072b33052bf4875f85513e9c669/fam does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:406) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1341) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1381) at org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkColFamDir(HFileCorruptionChecker.java:152) at org.apache.hadoop.hbase.util.TestHBaseFsck$2$1.checkColFamDir(TestHBaseFsck.java:1401) at org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkRegionDir(HFileCorruptionChecker.java:185) at org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:267) at org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:258) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6688) folder referred by thrift demo app instructions is outdated
[ https://issues.apache.org/jira/browse/HBASE-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6688: - Fix Version/s: (was: 0.95.0) 0.94.2 Fix up after bulk move overwrote some 0.94.2 fix versions w/ 0.95.0 (Noticed by Lars Hofhansl) folder referred by thrift demo app instructions is outdated --- Key: HBASE-6688 URL: https://issues.apache.org/jira/browse/HBASE-6688 Project: HBase Issue Type: Bug Affects Versions: 0.95.2 Reporter: Jimmy Xiang Assignee: stack Priority: Minor Fix For: 0.94.2 Attachments: thrift094.txt, thrift.txt Due to the source tree module change for 0.96, the instructions in the thrift demo example don't match the folder structure any more. In the instruction, it is referring to: ../../../src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift it should be ../../hbase-server/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira