[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181900#comment-13181900 ] Jieshan Bean commented on HBASE-5088: - It seems my reply is too late. Thank you all:) I suggest backport this patch to 0.90.6. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 5088-syncObj.txt, 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181911#comment-13181911 ] Hudson commented on HBASE-5088: --- Integrated in HBase-0.92-security #65 (See [https://builds.apache.org/job/HBase-0.92-security/65/]) HBASE-5088 A concurrency issue on SoftValueSortedMap (Jieshan Bean and Lars H) larsh : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 5088-syncObj.txt, 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4357) Region stayed in transition - in closing state
[ https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181910#comment-13181910 ] Hudson commented on HBASE-4357: --- Integrated in HBase-0.92-security #65 (See [https://builds.apache.org/job/HBase-0.92-security/65/]) HBASE-4357 Region stayed in transition - in closing state (Ming Ma) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRootHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java Region stayed in transition - in closing state -- Key: HBASE-4357 URL: https://issues.apache.org/jira/browse/HBASE-4357 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0, 0.94.0 Attachments: 4357.txt, HBASE-4357-0.92.patch Got the following during testing, 1. On a given machine, kill RS process id. Then kill HMaster process id. 2. Start RS first via bin/hbase-daemon.sh --config ./conf start regionserver.. Start HMaster via bin/hbase-daemon.sh --config ./conf start master. One region of a table stayed in closing state. According to zookeeper, 794a6ff17a4de0dd0a19b984ba18eea9 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), server=sea-esxi-0,6,1315428682281 According to .META. table, the region has been assigned to from sea-esxi-0 to sea-esxi-4. miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. sea-esxi-4:60030 H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181909#comment-13181909 ] Hudson commented on HBASE-5041: --- Integrated in HBase-0.92-security #65 (See [https://builds.apache.org/job/HBase-0.92-security/65/]) HBASE-5041 Major compaction on non existing table does not throw error (Shrijeet) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java Major compaction on non existing table does not throw error Key: HBASE-5041 URL: https://issues.apache.org/jira/browse/HBASE-5041 Project: HBase Issue Type: Bug Components: regionserver, shell Affects Versions: 0.90.3 Reporter: Shrijeet Paliwal Assignee: Shrijeet Paliwal Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch Following will not complain even if fubar does not exist {code} echo major_compact 'fubar' | $HBASE_HOME/bin/hbase shell {code} The downside for this defect is that major compaction may be skipped due to a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181915#comment-13181915 ] ramkrishna.s.vasudevan commented on HBASE-5088: --- @Lars I too feel we can backport to 0.90? A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 5088-syncObj.txt, 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5137: -- Attachment: HBASE-5137.patch MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181963#comment-13181963 ] ramkrishna.s.vasudevan commented on HBASE-5137: --- Patch for 0.90. MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182014#comment-13182014 ] Zhihong Yu commented on HBASE-5137: --- Minor comment: {code} +if (checkFileSystem() retrySplitting) + LOG.info(Retrying failed log splitting + logDir.toString()); +else { {code} Please add braces around the log statement. I think the above check should go into TRUNK as well (aborting in the case of not retrying). Should we also handle InterruptedException, as TRUNK does ? MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option
[ https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182038#comment-13182038 ] jirapos...@reviews.apache.org commented on HBASE-4224: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3308/#review4236 --- /src/main/java/org/apache/hadoop/hbase/ServerName.java https://reviews.apache.org/r/3308/#comment9579 Please enclose the assignment on line 292 in curly braces. /src/main/java/org/apache/hadoop/hbase/ServerName.java https://reviews.apache.org/r/3308/#comment9580 Since IPv4 support is built in, I suggest naming this method isValidHost. /src/main/java/org/apache/hadoop/hbase/ServerName.java https://reviews.apache.org/r/3308/#comment9581 Why the extra line here ? /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java https://reviews.apache.org/r/3308/#comment9587 I think threadPool is a better name for this field. /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java https://reviews.apache.org/r/3308/#comment9582 I think Async is unnecessary here - that's what threads provide. /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java https://reviews.apache.org/r/3308/#comment9584 Why not call tableExists() directly ? /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java https://reviews.apache.org/r/3308/#comment9583 Should include the actual name passed in exception message. /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java https://reviews.apache.org/r/3308/#comment9585 Should read 'Exception parsing server name' /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java https://reviews.apache.org/r/3308/#comment9586 Since serverToRegionsMap has been created, you can return serverToRegionsMap here. /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java https://reviews.apache.org/r/3308/#comment9588 regions could be empty, right ? /src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java https://reviews.apache.org/r/3308/#comment9589 This and flushAllRegions() are similar. Can we introduce just one new method which checks whether the list is empty to decide what to do ? i.e. move the check @ line 1403 of HBaseAdmin to the implementation of the new method. /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java https://reviews.apache.org/r/3308/#comment9590 Curly braces, please. /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java https://reviews.apache.org/r/3308/#comment9591 Do we need to place a try/catch block around line 2795 ? Currently the first failure would stop subsequent flushes. - Ted On 2012-01-06 18:48:11, Akash Ashok wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3308/ bq. --- bq. bq. (Updated 2012-01-06 18:48:11) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Flush by RegionServer bq. bq. bq. This addresses bug HBase-4224. bq. https://issues.apache.org/jira/browse/HBase-4224 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1226330 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1226330 bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1226330 bq./src/main/java/org/apache/hadoop/hbase/ServerName.java 1226330 bq. bq. Diff: https://reviews.apache.org/r/3308/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Akash bq. bq. Need a flush by regionserver rather than by table option Key: HBASE-4224 URL: https://issues.apache.org/jira/browse/HBASE-4224 Project: HBase Issue Type: Bug Components: shell Reporter: stack Assignee: Akash Ashok Attachments: HBase-4224-v2.patch, HBase-4224.patch This evening needed to clean out logs on the cluster. logs are by regionserver. to let go of logs, we need to have all edits emptied from memory. only flush is by table or region. We need to be able to flush the regionserver. Need to add this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182046#comment-13182046 ] Lars Hofhansl commented on HBASE-5088: -- OK. Sorry for presuming. I'll commit to 0.90 later today. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 5088-syncObj.txt, 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-5141: -- Assignee: Jean-Daniel Cryans Status: Patch Available (was: Open) Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-5141: -- Attachment: HBASE-5141-v2.patch This second patch survives my little test. What I was missing was that the packet also contains a reference to the params, so I have to clear out both (that was a bit confusing). Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182060#comment-13182060 ] ramkrishna.s.vasudevan commented on HBASE-5137: --- @Ted In trunk we sleep for a configured time and hence we handle InterruptedException. But i think that is also not needed as in trunk once we know file system is not available we do Runtime.halt(). If the file system is available why do we need to sleep for some time and then retry. MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182070#comment-13182070 ] Hadoop QA commented on HBASE-5141: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12509797/HBASE-5141-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 79 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/696//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/696//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/696//console This message is automatically generated. Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182103#comment-13182103 ] stack commented on HBASE-5141: -- +1 on the patch. Small. Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182111#comment-13182111 ] Zhihong Yu commented on HBASE-5137: --- Nicolas might know the reason for introducing hbase.hlog.split.failure.retry.interval parameter Please provide a patch for 0.92 and TRUNK which adds check for retrySplitting in the following if statement (line 220): {code} if (!checkFileSystem()) { {code} MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4357) Region stayed in transition - in closing state
[ https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182114#comment-13182114 ] stack commented on HBASE-4357: -- +1 Nice patch Ming. Regards conversation above on what if HRS can't close a region, I'd say lets go basic for now and crash out the HRS and let ServerShutdownHandler make sense of it all. Not the TimeoutMonitor IMO. TM at the moment is way too heavy-handed. Needs to be made more of a butterfly than bulldozer before we let it do closing fixups. Good stuff. Region stayed in transition - in closing state -- Key: HBASE-4357 URL: https://issues.apache.org/jira/browse/HBASE-4357 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0, 0.94.0 Attachments: 4357.txt, HBASE-4357-0.92.patch Got the following during testing, 1. On a given machine, kill RS process id. Then kill HMaster process id. 2. Start RS first via bin/hbase-daemon.sh --config ./conf start regionserver.. Start HMaster via bin/hbase-daemon.sh --config ./conf start master. One region of a table stayed in closing state. According to zookeeper, 794a6ff17a4de0dd0a19b984ba18eea9 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), server=sea-esxi-0,6,1315428682281 According to .META. table, the region has been assigned to from sea-esxi-0 to sea-esxi-4. miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. sea-esxi-4:60030 H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182111#comment-13182111 ] Zhihong Yu edited comment on HBASE-5137 at 1/7/12 10:04 PM: Nicolas might know the reason for introducing hbase.hlog.split.failure.retry.interval parameter was (Author: zhi...@ebaysf.com): Nicolas might know the reason for introducing hbase.hlog.split.failure.retry.interval parameter Please provide a patch for 0.92 and TRUNK which adds check for retrySplitting in the following if statement (line 220): {code} if (!checkFileSystem()) { {code} MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: 5137-trunk.txt, HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5137: -- Attachment: 5137-trunk.txt Suggested patch for TRUNK. MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: 5137-trunk.txt, HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5138) [ref manual] Add a discussion on the number of regions
[ https://issues.apache.org/jira/browse/HBASE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182117#comment-13182117 ] stack commented on HBASE-5138: -- +1 on nice doc. [ref manual] Add a discussion on the number of regions -- Key: HBASE-5138 URL: https://issues.apache.org/jira/browse/HBASE-5138 Project: HBase Issue Type: Task Reporter: Jean-Daniel Cryans ntelford on IRC made the good point that we say people shouldn't have too many regions, but we don't say why. His problem currently is: {quote} 09:21 ntelford problem is, if you're running MR jobs on a subset of that data, you need the regions to be as small as possible otherwise tasks don't get allocated in parallel much 09:22 ntelford so we've found we have to strike a balance between keeping them small for MR and keeping them large for HBase to behave well 09:22 ntelford we erred on the side of smaller regions because our MR issues were more immediate - we couldn't find any documentation or anecdotal evidence as to why HBase doesn't like lots of regions {quote} The three main issues I can think of when having too many regions are: - mslab requires 2mb per memstore (that's 2mb per family per region). 1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable. - if you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny flushes when you have too many regions which in turn generates compactions. Rewriting the same data tens of times is the last thing you want. An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore usage of 5GB (the region server would have a big heap). Once it reaches 5GB it will force flush the biggest region, at that point they should almost all have about 5MB of data so it would flush that amount. 5MB inserted later, it would flush another region that will now have a bit over 5MB of data, and so on. - the new master is allergic to tons of regions, and will take a lot of time assigning them and moving them around in batches. The reason is that it's heavy on ZK usage, and it's not very async at the moment (could really be improved). Another issue is the effect of the number of regions on mapreduce jobs. Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps. This comes back to ntelford's problem of needing to scan portions of tables. To solve his problem, we discussed using a custom input format that generates many splits per region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2012-01-07_14_12_48.patch Attaching a patch rebased on trunk changes. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182120#comment-13182120 ] stack commented on HBASE-5141: -- Let me commit Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5141: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5112) TestReplication#queueFailover flaky due to potentially uninitialized Scan
[ https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5112: - Resolution: Fixed Fix Version/s: (was: 0.94.0) Status: Resolved (was: Patch Available) This was committed a while back. TestReplication#queueFailover flaky due to potentially uninitialized Scan - Key: HBASE-5112 URL: https://issues.apache.org/jira/browse/HBASE-5112 Project: HBase Issue Type: Test Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.92.0 Attachments: 5112-v2.txt, hbase-5112.patch, org.apache.hadoop.hbase.replication.TestReplication-output.txt In TestReplication#queueFailover, the second scan is not reset for each new scan. Followed scan may not be able to scan the whole table. So it cannot get all the data and the test fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182122#comment-13182122 ] stack commented on HBASE-5141: -- Committed 0.92 and trunk. Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4357) Region stayed in transition - in closing state
[ https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4357: - Resolution: Fixed Fix Version/s: (was: 0.94.0) Status: Resolved (was: Patch Available) Was committed by Ted a day or so ago. Resolving. Region stayed in transition - in closing state -- Key: HBASE-4357 URL: https://issues.apache.org/jira/browse/HBASE-4357 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0 Attachments: 4357.txt, HBASE-4357-0.92.patch Got the following during testing, 1. On a given machine, kill RS process id. Then kill HMaster process id. 2. Start RS first via bin/hbase-daemon.sh --config ./conf start regionserver.. Start HMaster via bin/hbase-daemon.sh --config ./conf start master. One region of a table stayed in closing state. According to zookeeper, 794a6ff17a4de0dd0a19b984ba18eea9 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), server=sea-esxi-0,6,1315428682281 According to .META. table, the region has been assigned to from sea-esxi-0 to sea-esxi-4. miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9. sea-esxi-4:60030 H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4218: --- Attachment: D447.21.patch mbautin updated the revision [jira] [HBASE-4218] HFile data block encoding framework and delta encoding implementation. Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan Fixing the -encode_in_cache_only option of LoadTestTool (it is still encode_in_cache_only, even though we use ENCODE_ON_DISK in the column family), and rebasing on most recent trunk changes. Unit tests still pass. REVISION DETAIL https://reviews.facebook.net/D447 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/KeyValue.java src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java src/main/java/org/apache/hadoop/hbase/io/encoding/EncoderBufferTooSmallException.java src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java src/main/ruby/hbase/admin.rb src/test/java/org/apache/hadoop/hbase/BROKE_TODO_FIX_TestAcidGuarantees.java src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java src/test/java/org/apache/hadoop/hbase/io/encoding/RedundantKVGenerator.java src/test/java/org/apache/hadoop/hbase/io/encoding/TestBufferedDataBlockEncoder.java src/test/java/org/apache/hadoop/hbase/io/encoding/TestChangingEncoding.java src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java src/test/java/org/apache/hadoop/hbase/io/encoding/TestUpgradeFromHFileV1ToEncoding.java src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileDataBlockEncoder.java
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182127#comment-13182127 ] Mikhail Bautin commented on HBASE-5141: --- FYI: The build is broken in trunk because of this patch. Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182129#comment-13182129 ] Hadoop QA commented on HBASE-4218: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12509806/D447.21.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 133 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/697//console This message is automatically generated. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5052) The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key)
[ https://issues.apache.org/jira/browse/HBASE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5052: - Resolution: Fixed Fix Version/s: (was: 0.94.0) Status: Resolved (was: Patch Available) Committed branch and trunk. Thanks for the patch Andrei. The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key) - Key: HBASE-5052 URL: https://issues.apache.org/jira/browse/HBASE-5052 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Fix For: 0.92.0 Attachments: HBASE-5052.patch When loading a coprocessor from hdfs, the jar file gets copied to a path on the local filesystem, which depends on the region name, and the region start key. The name is cleaned, but not enough, so when you have filesystem unfriendly characters (/?:, etc), the coprocessor is not loaded, and an error is thrown -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5103) Fix improper master znode deserialization
[ https://issues.apache.org/jira/browse/HBASE-5103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5103: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed a while back. Resolving. Fix improper master znode deserialization - Key: HBASE-5103 URL: https://issues.apache.org/jira/browse/HBASE-5103 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: hbase-5103.patch In ActiveMasterManager#blockUntilBecomingActiveMaster the master znode is created as a versioned serialized version of ServerName {code} if (ZKUtil.createEphemeralNodeAndWatch(this.watcher, this.watcher.masterAddressZNode, sn.getVersionedBytes())) { {code} There are a few user visible places where it is used but not deserialized properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182132#comment-13182132 ] Mikhail Bautin commented on HBASE-5141: --- I get the error shown at http://pastebin.com/AdAp0M35 when trying to build the following commit: Author: stack stack@13f79535-47bb-0310-9956-ffa450edef68 Date: Sat Jan 7 14:16:11 2012 HBASE-5141 Memory leak in MonitoredRPCHandlerImpl git-svn-id: http://svn.apache.org/repos/asf/hbase/trunk@1228740 Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182134#comment-13182134 ] Mikhail Bautin commented on HBASE-4218: --- @Ted: I was running a load test with LZO compression and PREFIX encoding and everything was fine, but then I switched to encoding in cache only and compactions started failing. I need to look into this. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182136#comment-13182136 ] Mikhail Bautin commented on HBASE-5141: --- Correction: use this svn command: svn diff http://svn.apache.org/repos/asf/hbase/trunk -r1228739 Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182135#comment-13182135 ] Mikhail Bautin commented on HBASE-5141: --- Actually, the committed patch contains more stuff than the patch attached to the JIRA: svn diff http://svn.apache.org/repos/asf/hbase/trunk -r1228740 Was the security version of the patch committed into trunk or something? Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5134) Remove getRegionServerWithoutRetries and getRegionServerWithRetries from HConnection Interface
[ https://issues.apache.org/jira/browse/HBASE-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5134: - Status: Open (was: Patch Available) Remove getRegionServerWithoutRetries and getRegionServerWithRetries from HConnection Interface -- Key: HBASE-5134 URL: https://issues.apache.org/jira/browse/HBASE-5134 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Attachments: 5134-v2.txt, 5134-v3.txt Its broke having these meta methods in HConnection. They take ServerCallables which themselves have HConnections inevitably. It makes for a tangle in the model and frustrates being able to do mocked implemenations of HConnection. These methods better belong in something like HConnectionManager, or elsewhere altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5134) Remove getRegionServerWithoutRetries and getRegionServerWithRetries from HConnection Interface
[ https://issues.apache.org/jira/browse/HBASE-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5134: - Attachment: 5134-v3.txt v3 is same as v2 except for one line change in TestAssignmentManager where I chance the generic params on a mocked method to be specific (The commit of Mings' new closeRegion method broke this). All good now. Remove getRegionServerWithoutRetries and getRegionServerWithRetries from HConnection Interface -- Key: HBASE-5134 URL: https://issues.apache.org/jira/browse/HBASE-5134 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.94.0 Attachments: 5134-v2.txt, 5134-v3.txt Its broke having these meta methods in HConnection. They take ServerCallables which themselves have HConnections inevitably. It makes for a tangle in the model and frustrates being able to do mocked implemenations of HConnection. These methods better belong in something like HConnectionManager, or elsewhere altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5134) Remove getRegionServerWithoutRetries and getRegionServerWithRetries from HConnection Interface
[ https://issues.apache.org/jira/browse/HBASE-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5134: - Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Patch Available (was: Open) Trying against hadoopqa to see what is broke. Remove getRegionServerWithoutRetries and getRegionServerWithRetries from HConnection Interface -- Key: HBASE-5134 URL: https://issues.apache.org/jira/browse/HBASE-5134 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.94.0 Attachments: 5134-v2.txt, 5134-v3.txt Its broke having these meta methods in HConnection. They take ServerCallables which themselves have HConnections inevitably. It makes for a tangle in the model and frustrates being able to do mocked implemenations of HConnection. These methods better belong in something like HConnectionManager, or elsewhere altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182139#comment-13182139 ] stack commented on HBASE-5141: -- Fixing Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness
[ https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182166#comment-13182166 ] chunhui shen commented on HBASE-5121: - @Ted ok, I will make a new patch MajorCompaction may affect scan's correctness - Key: HBASE-5121 URL: https://issues.apache.org/jira/browse/HBASE-5121 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.92.1, 0.90.6 Attachments: 5121-trunk-combined.txt, 5121.90, hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch In our test, there are two families' keyvalue for one row. But we could find a infrequent problem when doing scan's next if majorCompaction happens concurrently. In the client's two continuous doing scan.next(): 1.First time, scan's next returns the result where family A is null. 2.Second time, scan's next returns the result where family B is null. The two next()'s result have the same row. If there are more families, I think the scenario will be more strange... We find the reason is that storescanner.peek() is changed after majorCompaction if there are delete type KeyValue. This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap is not sure to be sorted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Attachment: (was: 5088-useMapInterfaces.txt) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Attachment: (was: 5088-syncObj.txt) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Attachment: (was: 5088-final.txt) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Attachment: (was: 5088-final2.txt) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Attachment: (was: 5088.generics.txt) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Attachment: 5088-0.90.txt This is what I committed to 0.90. While I worked on that I noticed that I can get rid more concrete uses of SoftvalueSortedMap in HConnectionManager (in fact all uses except creation, which is nice). I'll do an addendum for this in 0.92 and trunk. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-0.90.txt, 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Fix Version/s: 0.90.6 A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5088-0.90.txt, 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5088: - Attachment: 5088-0.92-trunk-addendum.txt This is the addendum. Now the code is pretty clean w.r.t. using Map interfaces. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5088-0.90.txt, 5088-0.92-trunk-addendum.txt, 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182175#comment-13182175 ] Lars Hofhansl commented on HBASE-5088: -- Removed some of the attachment that are just confusing. The addendum also fixes minor formatting inconsistencies that I had introduced. All is good now. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5088-0.90.txt, 5088-0.92-trunk-addendum.txt, 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness
[ https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182178#comment-13182178 ] chunhui shen commented on HBASE-5121: - @Ted If we change StoreScanner.next() to return an enum , we should change all the implement of InternalScanner.next(), therefore KeyValueHeap.next()'s return should also be changed to an enum. It needs to change the logic who calles KeyValueHeap.next() . I think it changes too much, does it need? MajorCompaction may affect scan's correctness - Key: HBASE-5121 URL: https://issues.apache.org/jira/browse/HBASE-5121 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.92.1, 0.90.6 Attachments: 5121-trunk-combined.txt, 5121.90, hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch In our test, there are two families' keyvalue for one row. But we could find a infrequent problem when doing scan's next if majorCompaction happens concurrently. In the client's two continuous doing scan.next(): 1.First time, scan's next returns the result where family A is null. 2.Second time, scan's next returns the result where family B is null. The two next()'s result have the same row. If there are more families, I think the scenario will be more strange... We find the reason is that storescanner.peek() is changed after majorCompaction if there are delete type KeyValue. This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap is not sure to be sorted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5052) The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key)
[ https://issues.apache.org/jira/browse/HBASE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182182#comment-13182182 ] Hudson commented on HBASE-5052: --- Integrated in HBase-0.92 #234 (See [https://builds.apache.org/job/HBase-0.92/234/]) HBASE-5052 The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key) stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key) - Key: HBASE-5052 URL: https://issues.apache.org/jira/browse/HBASE-5052 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Fix For: 0.92.0 Attachments: HBASE-5052.patch When loading a coprocessor from hdfs, the jar file gets copied to a path on the local filesystem, which depends on the region name, and the region start key. The name is cleaned, but not enough, so when you have filesystem unfriendly characters (/?:, etc), the coprocessor is not loaded, and an error is thrown -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness
[ https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182184#comment-13182184 ] Zhihong Yu commented on HBASE-5121: --- @Chunhui: I agree. That's why I think creating a new exception is acceptable. Maybe Todd or Stack has better idea. MajorCompaction may affect scan's correctness - Key: HBASE-5121 URL: https://issues.apache.org/jira/browse/HBASE-5121 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.92.1, 0.90.6 Attachments: 5121-trunk-combined.txt, 5121.90, hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch In our test, there are two families' keyvalue for one row. But we could find a infrequent problem when doing scan's next if majorCompaction happens concurrently. In the client's two continuous doing scan.next(): 1.First time, scan's next returns the result where family A is null. 2.Second time, scan's next returns the result where family B is null. The two next()'s result have the same row. If there are more families, I think the scenario will be more strange... We find the reason is that storescanner.peek() is changed after majorCompaction if there are delete type KeyValue. This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap is not sure to be sorted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5121) MajorCompaction may affect scan's correctness
[ https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5121: - Attachment: 5121-suggest.txt I was looking at the patch a bit. Maybe there is a simpler solution: You say in that when this scenario happens the KV just vanishes. What your logic in KeyValueHeap is essentially doing is to retry with the next KV on the heap. So, we can just tell the KeyValueHeap that there are more KVs in this case (returning true for mayContainMoreRows). With this your test passes. (it is entirely possible that my reasoning is incorrect, and it just accidentally lets the test pass). MajorCompaction may affect scan's correctness - Key: HBASE-5121 URL: https://issues.apache.org/jira/browse/HBASE-5121 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.92.1, 0.90.6 Attachments: 5121-suggest.txt, 5121-trunk-combined.txt, 5121.90, hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch In our test, there are two families' keyvalue for one row. But we could find a infrequent problem when doing scan's next if majorCompaction happens concurrently. In the client's two continuous doing scan.next(): 1.First time, scan's next returns the result where family A is null. 2.Second time, scan's next returns the result where family B is null. The two next()'s result have the same row. If there are more families, I think the scenario will be more strange... We find the reason is that storescanner.peek() is changed after majorCompaction if there are delete type KeyValue. This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap is not sure to be sorted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness
[ https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182190#comment-13182190 ] Hadoop QA commented on HBASE-5121: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12509823/5121-suggest.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 79 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/700//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/700//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/700//console This message is automatically generated. MajorCompaction may affect scan's correctness - Key: HBASE-5121 URL: https://issues.apache.org/jira/browse/HBASE-5121 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.92.1, 0.90.6 Attachments: 5121-suggest.txt, 5121-trunk-combined.txt, 5121.90, hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch In our test, there are two families' keyvalue for one row. But we could find a infrequent problem when doing scan's next if majorCompaction happens concurrently. In the client's two continuous doing scan.next(): 1.First time, scan's next returns the result where family A is null. 2.Second time, scan's next returns the result where family B is null. The two next()'s result have the same row. If there are more families, I think the scenario will be more strange... We find the reason is that storescanner.peek() is changed after majorCompaction if there are delete type KeyValue. This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap is not sure to be sorted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182192#comment-13182192 ] Hudson commented on HBASE-5088: --- Integrated in HBase-0.92 #235 (See [https://builds.apache.org/job/HBase-0.92/235/]) HBASE-5088 addendum larsh : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5088-0.90.txt, 5088-0.92-trunk-addendum.txt, 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5137: -- Attachment: HBASE-5137.patch Patch for 0.90 addressing Ted's comment of adding braces. But did not handle interrupted exception. @Ted Pls check if it is ok. MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182199#comment-13182199 ] ramkrishna.s.vasudevan commented on HBASE-5088: --- @Lars Good on you for committing it to 0.90 :).. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5088-0.90.txt, 5088-0.92-trunk-addendum.txt, 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5143) Fix config typo in pluggable load balancer factory
Fix config typo in pluggable load balancer factory -- Key: HBASE-5143 URL: https://issues.apache.org/jira/browse/HBASE-5143 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Critical HBASE-4240 made LoadBalancer pluggable. Configuration it loads seems to be wrongly named and carries a typo: hbase.maser.loadBalancer.class Could rather be hbase.master.loadbalancer.class Luckily 0.92 is not out yet and we should fix it asap, before folks start using it. Attaching patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182204#comment-13182204 ] Hudson commented on HBASE-5088: --- Integrated in HBase-TRUNK-security #67 (See [https://builds.apache.org/job/HBase-TRUNK-security/67/]) HBASE-5088 addendum larsh : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5088-0.90.txt, 5088-0.92-trunk-addendum.txt, 5088-final3.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5052) The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key)
[ https://issues.apache.org/jira/browse/HBASE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182203#comment-13182203 ] Hudson commented on HBASE-5052: --- Integrated in HBase-TRUNK-security #67 (See [https://builds.apache.org/job/HBase-TRUNK-security/67/]) HBASE-5052 The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key) stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java The path where a dynamically loaded coprocessor jar is copied on the local file system depends on the region name (and implicitly, the start key) - Key: HBASE-5052 URL: https://issues.apache.org/jira/browse/HBASE-5052 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Fix For: 0.92.0 Attachments: HBASE-5052.patch When loading a coprocessor from hdfs, the jar file gets copied to a path on the local filesystem, which depends on the region name, and the region start key. The name is cleaned, but not enough, so when you have filesystem unfriendly characters (/?:, etc), the coprocessor is not loaded, and an error is thrown -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl
[ https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182202#comment-13182202 ] Hudson commented on HBASE-5141: --- Integrated in HBase-TRUNK-security #67 (See [https://builds.apache.org/job/HBase-TRUNK-security/67/]) HBASE-5141 Memory leak in MonitoredRPCHandlerImpl -- REDO HBASE-5141 Memory leak in MonitoredRPCHandlerImpl -- REVERT. OVER-COMMITTED. REVERTING ALL SO CAN REDO COMMIT HBASE-5141 Memory leak in MonitoredRPCHandlerImpl stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java Memory leak in MonitoredRPCHandlerImpl -- Key: HBASE-5141 URL: https://issues.apache.org/jira/browse/HBASE-5141 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5141-v2.patch, HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 PM.png I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything. The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl. Will attach a screen shot of jprofiler's analysis in a moment. This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4240) Allow Loadbalancer to be pluggable.
[ https://issues.apache.org/jira/browse/HBASE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182205#comment-13182205 ] Harsh J commented on HBASE-4240: Hi, This introduced a badly named config. Please see HBASE-5143 for a fix. Allow Loadbalancer to be pluggable. --- Key: HBASE-4240 URL: https://issues.apache.org/jira/browse/HBASE-4240 Project: HBase Issue Type: New Feature Components: master Affects Versions: 0.94.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.92.0 Attachments: HBASE-4240-0.patch, HBASE-4240-1.patch, HBASE-4240-2.patch, HBASE-4240-3.patch Everyone seems to want something different from a load balancer. People want low latency, simplicity, and total control. It seems like at some point the load balancer can't be all things to all people. Something akin to what hadoop JT's pluggable scheduler seems like it will enable all solutions without making the code much more complex. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5143) Fix config typo in pluggable load balancer factory
[ https://issues.apache.org/jira/browse/HBASE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-5143: --- Attachment: HBASE-5143.patch Patch fixes this typo and uses a better name (in sync with other names, no camel casing). Please apply to 0.92 as well. Fix config typo in pluggable load balancer factory -- Key: HBASE-5143 URL: https://issues.apache.org/jira/browse/HBASE-5143 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Critical Attachments: HBASE-5143.patch HBASE-4240 made LoadBalancer pluggable. Configuration it loads seems to be wrongly named and carries a typo: hbase.maser.loadBalancer.class Could rather be hbase.master.loadbalancer.class Luckily 0.92 is not out yet and we should fix it asap, before folks start using it. Attaching patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5143) Fix config typo in pluggable load balancer factory
[ https://issues.apache.org/jira/browse/HBASE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182208#comment-13182208 ] ramkrishna.s.vasudevan commented on HBASE-5143: --- @Harsh Thanks for the patch. +1 Fix config typo in pluggable load balancer factory -- Key: HBASE-5143 URL: https://issues.apache.org/jira/browse/HBASE-5143 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Critical Attachments: HBASE-5143.patch HBASE-4240 made LoadBalancer pluggable. Configuration it loads seems to be wrongly named and carries a typo: hbase.maser.loadBalancer.class Could rather be hbase.master.loadbalancer.class Luckily 0.92 is not out yet and we should fix it asap, before folks start using it. Attaching patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182209#comment-13182209 ] Zhihong Yu commented on HBASE-5137: --- Second patch is fine. What do you think of the patch for trunk ? MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5143) Fix config typo in pluggable load balancer factory
[ https://issues.apache.org/jira/browse/HBASE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182211#comment-13182211 ] Harsh J commented on HBASE-5143: Thanks Ram, will rebase my work branch on HBASE-3274 once this is in, and continue. I should have an update for incremental review soon (mostly i-to-r, next). Fix config typo in pluggable load balancer factory -- Key: HBASE-5143 URL: https://issues.apache.org/jira/browse/HBASE-5143 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Critical Attachments: HBASE-5143.patch HBASE-4240 made LoadBalancer pluggable. Configuration it loads seems to be wrongly named and carries a typo: hbase.maser.loadBalancer.class Could rather be hbase.master.loadbalancer.class Luckily 0.92 is not out yet and we should fix it asap, before folks start using it. Attaching patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5144) Add a test for LoadBalancerFactory
Add a test for LoadBalancerFactory -- Key: HBASE-5144 URL: https://issues.apache.org/jira/browse/HBASE-5144 Project: HBase Issue Type: Sub-task Reporter: Harsh J Priority: Minor We should add a simple class loading test surrounding LoadBalancerFactory to prevent regressions. Perhaps a simple test that loads and uses a custom load balancer should suffice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5144) Add a test for LoadBalancerFactory
[ https://issues.apache.org/jira/browse/HBASE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-5144: --- Fix Version/s: (was: 0.92.0) 0.94.0 Add a test for LoadBalancerFactory -- Key: HBASE-5144 URL: https://issues.apache.org/jira/browse/HBASE-5144 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Minor Fix For: 0.94.0 We should add a simple class loading test surrounding LoadBalancerFactory to prevent regressions. Perhaps a simple test that loads and uses a custom load balancer should suffice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5143) Fix config typo in pluggable load balancer factory
[ https://issues.apache.org/jira/browse/HBASE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-5143: --- Fix config typo in pluggable load balancer factory -- Key: HBASE-5143 URL: https://issues.apache.org/jira/browse/HBASE-5143 Project: HBase Issue Type: Sub-task Components: master Reporter: Harsh J Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: HBASE-5143.patch HBASE-4240 made LoadBalancer pluggable. Configuration it loads seems to be wrongly named and carries a typo: hbase.maser.loadBalancer.class Could rather be hbase.master.loadbalancer.class Luckily 0.92 is not out yet and we should fix it asap, before folks start using it. Attaching patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5145) HMasterCommandLine's -minServers seems to be useless.
HMasterCommandLine's -minServers seems to be useless. - Key: HBASE-5145 URL: https://issues.apache.org/jira/browse/HBASE-5145 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 0.94.0 Reporter: Harsh J HMasterCommandLine gets a number via -minServers opt. and sets it to a config param hbase.regions.server.count.min. This config is not used anywhere else. Perhaps it wants to use hbase.master.wait.on.regionservers.mintostart instead? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira