[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction
[ https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282235#comment-13282235 ] Hudson commented on HBASE-6033: --- Integrated in HBase-TRUNK #2920 (See [https://builds.apache.org/job/HBase-TRUNK/2920/]) HBASE-6033 Adding some fuction to check if a table/region is in compaction (Jimmy) (Revision 1342149) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java * /hbase/trunk/src/main/protobuf/Admin.proto * /hbase/trunk/src/main/resources/hbase-webapps/master/table.jsp * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionState.java * /hbase/trunk/src/test/resources/hbase-site.xml Adding some fuction to check if a table/region is in compaction --- Key: HBASE-6033 URL: https://issues.apache.org/jira/browse/HBASE-6033 Project: HBase Issue Type: New Feature Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: 6033-v7.txt, hbase-6033_v2.patch, hbase-6033_v3.patch, hbase_6033_v5.patch, hbase_6033_v6.patch, table_ui.png This feature will be helpful to find out if a major compaction is going on. We can show if it is in any minor compaction too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-6070: - Assignee: ramkrishna.s.vasudevan AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: HBASE-6070_0.94.patch AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: HBASE-6070_0.92.patch AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: HBASE-6070_trunk.patch Uploaded patches for all branches. Tested in cluster including scenarios for HBASE-5806. Pls review and provide your comments. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5352) ACL improvements
[ https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282382#comment-13282382 ] Laxman commented on HBASE-5352: --- Enis Matt, hope you don't mind if I add some sub-tasks related to ACL here. Already added HBASE-6086. Matt clarified this is a duplicate of HBASE-5372. Also one more observation I wanted to validate with you. Currently, AccessController doesn't provide implementation for some methods like preFlush, preSplit and many others. That means, any unauthorized user can trigger these operations on a table. Do we need to handle this in a separate jira? ACL improvements Key: HBASE-5352 URL: https://issues.apache.org/jira/browse/HBASE-5352 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.92.1, 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar In this issue I would like to open discussion for a few minor ACL related improvements. The proposed changes are as follows: 1. Introduce something like AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so that clients can check access rights before carrying out the operations. We need this kind of operation for HCATALOG-245, which introduces authorization providers for hbase over hcat. We cannot use getUserPermissions() since it requires ADMIN permissions on the global/table level. 2. getUserPermissions(tableName)/grant/revoke and drop/modify table operations should not check for global CREATE/ADMIN rights, but table CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read from a table, she should be able to read the table's permissions. We can choose whether we want only READ or ADMIN permissions for getUserPermission(). Since we check for global permissions first for table permissions, configuring table access using global permissions will continue to work. 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness) From all 3, we may want to backport the first one to 0.92 since without it, Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. I will create subissues and convert HBASE-5342 to a subtask when we get some feedback, and opinions for going further. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated HBASE-6071: Attachment: HConnectionManager_HBASE-6071-0.90.0.patch getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.4 Reporter: Igal Shilman Priority: Minor Attachments: HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Status: Patch Available (was: Open) AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated HBASE-6071: Affects Version/s: (was: 0.90.4) 0.90.0 getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.0 Reporter: Igal Shilman Priority: Minor Attachments: HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: Filtered_scans_v5.patch Fixed issues with limits in next() call. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Patch Available (was: Open) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282497#comment-13282497 ] Max Lapan commented on HBASE-5416: -- After a long delay, I decided to return to this optimization. We have this patch on our production system (300TB HBase data, 160 nodes) during last two months without issues. 2-phase approach tests demonstrated much worse performance improvement over this patch - only 2 times speedup vs near 20 times. I extended tests, but don't feel myself experienced enougth to implement concurrent, multithread test as suggested, sorry. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282495#comment-13282495 ] Hadoop QA commented on HBASE-5416: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529061/Filtered_scans_v5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1982//console This message is automatically generated. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated HBASE-6071: Fix Version/s: 0.90.7 Labels: client ipc (was: ) Status: Patch Available (was: Open) getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Fix For: 0.90.7 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node
Gopinathan A created HBASE-6088: --- Summary: Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node Key: HBASE-6088 URL: https://issues.apache.org/jira/browse/HBASE-6088 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Gopinathan A Fix For: 0.94.1 Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node {noformat} 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26668ms for sessionid 0x1377a75f41d0012, closing socket connection and attempting reconnect 2012-05-24 01:45:41,464 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144 {noformat} {noformat} 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: cleanupCurrentWriter waiting for transactions to get synced total 189377 synced till here 189365 2012-05-24 01:45:48,474 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. java.io.IOException: Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/bd1079bf948c672e493432020dc0e144 at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747) at org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919) at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869) at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) ... 5 more 2012-05-24 01:45:48,476 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. {noformat} {noformat} 2012-05-24 01:47:28,141 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is not a retry 2012-05-24 01:47:28,142 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144 java.io.IOException: Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144 at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865) at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) {noformat} Due to the above exception, region splitting was failing contineously more than 5hrs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282504#comment-13282504 ] Hadoop QA commented on HBASE-6071: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528963/HConnectionManager_HBASE-6071-0.90.0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1983//console This message is automatically generated. getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Fix For: 0.90.7 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Open (was: Patch Available) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: (was: Filtered_scans_v5.patch) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: Filtered_scans_v5.patch Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282516#comment-13282516 ] Zhihong Yu commented on HBASE-6070: --- {code} +// but the RS had went down before completing the split process then will not try to {code} 'had went down' - 'had gone down' {code} + if(response == null) return null; {code} Space after 'if' {code} + static Result getMetaTableRowResultAsSplittedRegion(final HRegionInfo hri, final ServerName sn) {code} The method should be called getMetaTableRowResultAsSplitRegion(). Should investigate the test failure in TestFromClientSide AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node
[ https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282518#comment-13282518 ] ramkrishna.s.vasudevan commented on HBASE-6088: --- While we start doing the split, there are two steps in zk node creation. - Create the node - Write the data RS_ZK_SPLITTING into it. Now after both the steps are completed we make an journal entry. Now if writing the data fails even on rollback we are not able to clean the node as we don't know the current journal entry. Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node Key: HBASE-6088 URL: https://issues.apache.org/jira/browse/HBASE-6088 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Gopinathan A Fix For: 0.94.1 Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node {noformat} 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26668ms for sessionid 0x1377a75f41d0012, closing socket connection and attempting reconnect 2012-05-24 01:45:41,464 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144 {noformat} {noformat} 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: cleanupCurrentWriter waiting for transactions to get synced total 189377 synced till here 189365 2012-05-24 01:45:48,474 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. java.io.IOException: Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/bd1079bf948c672e493432020dc0e144 at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811) at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747) at org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919) at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869) at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) ... 5 more 2012-05-24 01:45:48,476 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144. {noformat} {noformat} 2012-05-24 01:47:28,141 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is not a retry 2012-05-24 01:47:28,142 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144 java.io.IOException: Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144 at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865) at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239) at
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282520#comment-13282520 ] Zhihong Yu commented on HBASE-5416: --- @Max: The new patch is much larger than previous version. Can you provide more detailed description on the change ? Thanks Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6068) Secure HBase cluster : Client not able to call some admin APIs
[ https://issues.apache.org/jira/browse/HBASE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-6068: -- Assignee: Matteo Bertozzi Secure HBase cluster : Client not able to call some admin APIs -- Key: HBASE-6068 URL: https://issues.apache.org/jira/browse/HBASE-6068 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.94.0 Reporter: Anoop Sam John Assignee: Matteo Bertozzi In case of secure cluster, we allow the HBase clients to read the zk nodes by providing the global read permissions to all for certain nodes. These nodes are the master address znode, root server znode and the clusterId znode. In ZKUtil.createACL() , we can see these node names are specially handled. But there are some other client side admin APIs which makes a read call into the zookeeper from the client. This include the isTableEnabled() call (May be some other. I have seen this). Here the client directly reads a node in the zookeeper ( node created for this table ) and the data is matched to know whether this is enabled or not. Now in secure cluster case any client can read zookeeper nodes which it needs for its normal operation like the master address and root server address. But what if the client calls this API? [isTableEnaled () ]. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: Filtered_scans_v5.patch Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Attachment: (was: Filtered_scans_v5.patch) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-5416: - Status: Patch Available (was: Open) Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282528#comment-13282528 ] Max Lapan commented on HBASE-5416: -- Additional code handled the case when InternalScanner::next called with limit != -1. In this case, we must remember KeyValueHeap we populated when limit reached, and restart this population on next method issue. I also added a test case for such situation. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Status: Open (was: Patch Available) AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: HBASE-6070_0.92_1.patch AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: HBASE-6070_0.94_1.patch AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Status: Patch Available (was: Open) AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: HBASE-6070_trunk_1.patch Updated patches fixing the comments. I tried running the failed testcase. It passed every time. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282534#comment-13282534 ] Zhihong Yu commented on HBASE-5416: --- Will go over the patch when I get into office. It would be nice to use https://reviews.apache.org to facilitate reviews. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282539#comment-13282539 ] Max Lapan commented on HBASE-5416: -- I tried to post it there, but constantly get Internal server error. Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: (was: HBASE-6070_trunk_1.patch) AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-6070: -- Attachment: HBASE-6070_trunk_1.patch Just reattaching the patch. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.
[ https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282543#comment-13282543 ] Max Lapan commented on HBASE-5416: -- Ahhh, I'm stupid, it works with hbase-git repository. Posted https://reviews.apache.org/r/5225/ Improve performance of scans with some kind of filters. --- Key: HBASE-5416 URL: https://issues.apache.org/jira/browse/HBASE-5416 Project: HBase Issue Type: Improvement Components: filters, performance, regionserver Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, Filtered_scans_v5.patch When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed. But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter. For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed. Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282580#comment-13282580 ] ramkrishna.s.vasudevan commented on HBASE-5916: --- @Chunhui The suggestion given above can simply be avoided by taking a the actual online servers list after getting the logFolders. This will ensure that we donot split any new RS that has checked in. In joinCluster(), as per the existing code if any new server has checked in and the root/meta had got assigned to it in joincluster we may think that it is an dead server because we alerady have passed the online servers. Hence we are trying to get the actual online list as per the patch. The problem that you have mentioned here bq.if Regionserver A with startcode 001 is restarted, and then Regionserver A with startcode 002 is in the onlineServers, but Regionserver A with startcode 001 is in the process by SSH, not in the deadServers This we are trying to avoid in our current v6 patch, by not remvoing from dead servers, any restarted server that is coming up during master initialization. Later after master initialization we try to clear the dead server which matches with the current online servers with same host name and port. There are other problems during SSH and master initialization that may lead to double assignment or concurrent modification exception. These things we will address in a new JIRA. Pls review the current patch and provide your suggestions. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-5916: -- Attachment: HBASE-5916_trunk_v6.patch Attached patch. Please review and provide suggestions/comments RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.
ramkrishna.s.vasudevan created HBASE-6089: - Summary: SSH and AM.joinCluster causes Concurrent Modification exception. Key: HBASE-6089 URL: https://issues.apache.org/jira/browse/HBASE-6089 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 AM.regions map is parallely accessed in SSH and Master initialization leading to ConcurrentModificationException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-5916: -- Status: Open (was: Patch Available) RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-5916: -- Status: Patch Available (was: Open) RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.
[ https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282582#comment-13282582 ] ramkrishna.s.vasudevan commented on HBASE-6089: --- {code} 2012-05-24 19:26:02,493 DEBUG org.apache.hadoop.hbase.master.ServerManager: New connection to linux146,60020,1337867810895 2012-05-24 19:26:02,552 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, region=2be5ef20db58b775953cc1107eb51d2d 2012-05-24 19:26:02,592 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, region=191b0c97f2d2a8262bf790093fdce2ab 2012-05-24 19:26:02,595 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, region=99d462b47ea5e301175d025204eff014 2012-05-24 19:26:03,957 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done for linux146,60020,1337867810895 2012-05-24 19:26:14,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, region=2be5ef20db58b775953cc1107eb51d2d 2012-05-24 19:26:14,781 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1337867810895, region=2be5ef20db58b775953cc1107eb51d2d 2012-05-24 19:26:14,785 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for et1,,1337864575331.2be5ef20db58b775953cc1107eb51d2d. from linux146,60020,1337867810895; deleting unassigned node 2012-05-24 19:26:14,786 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x1377ea1a1fe002d Deleting existing unassigned node for 2be5ef20db58b775953cc1107eb51d2d that is in expected state RS_ZK_REGION_OPENED 2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x1377ea1a1fe002d Successfully deleted unassigned node for region 2be5ef20db58b775953cc1107eb51d2d in expected state RS_ZK_REGION_OPENED 2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, region=5a84a4f4eaf2519e36a8ccc2e9c83b04 2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region et1,,1337864575331.2be5ef20db58b775953cc1107eb51d2d. has been deleted. 2012-05-24 19:26:23,862 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1337866620614 2012-05-24 19:26:51,927 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-05-24 19:26:51,931 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$EntryIterator.next(TreeMap.java:1136) at java.util.TreeMap$EntryIterator.next(TreeMap.java:1131) at org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:409) at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:363) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:607) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:374) at java.lang.Thread.run(Thread.java:662) {code} SSH and AM.joinCluster causes Concurrent Modification exception. Key: HBASE-6089 URL: https://issues.apache.org/jira/browse/HBASE-6089 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 AM.regions map is parallely accessed in SSH and Master initialization leading to ConcurrentModificationException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.
[ https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-6089: - Assignee: rajeshbabu SSH and AM.joinCluster causes Concurrent Modification exception. Key: HBASE-6089 URL: https://issues.apache.org/jira/browse/HBASE-6089 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 AM.regions map is parallely accessed in SSH and Master initialization leading to ConcurrentModificationException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282589#comment-13282589 ] Hadoop QA commented on HBASE-5916: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529160/HBASE-5916_trunk_v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 34 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestClockSkewDetection Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1986//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1986//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1986//console This message is automatically generated. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6074) TestHLog is flaky
[ https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6074: --- Attachment: TestHLog.patch.txt @Ted, yes, I saw the failures with hbase-0.92/hadoop-1.0.3. In my first trial, the org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd was failing intermittently. Here are the snippets from the log: {noformat} org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is accessed by DFSClient_-1644967697 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) Stacktrace org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is accessed by DFSClient_-1644967697 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) at org.apache.hadoop.ipc.Client.call(Client.java:1066) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.complete(Unknown Source) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy8.complete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1017) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:215) at org.apache.hadoop.hbase.regionserver.wal.HLog.close(HLog.java:914) at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd(TestHLog.java:480) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) {noformat} I consulted with an HDFS dev, and he thought that there might be a race condition with the shutting down of cluster in testAppendClose (the previous test in the
[jira] [Updated] (HBASE-6074) TestHLog is flaky
[ https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6074: --- Attachment: 6074-1.patch 6074-1.patch @Ted, yes, I saw the failures with hbase-0.92/hadoop-1.0.3. In my first trial, the org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd was failing intermittently. Here are the snippets from the log: {noformat} org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is accessed by DFSClient_-1644967697 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) Stacktrace org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is accessed by DFSClient_-1644967697 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663) at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) at org.apache.hadoop.ipc.Client.call(Client.java:1066) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy8.complete(Unknown Source) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy8.complete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1017) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:215) at org.apache.hadoop.hbase.regionserver.wal.HLog.close(HLog.java:914) at org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd(TestHLog.java:480) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) {noformat} I consulted with an
[jira] [Updated] (HBASE-6074) TestHLog is flaky
[ https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6074: --- Attachment: (was: 6074-1.patch) TestHLog is flaky - Key: HBASE-6074 URL: https://issues.apache.org/jira/browse/HBASE-6074 Project: HBase Issue Type: Test Components: test Affects Versions: 0.92.0 Reporter: Devaraj Das Attachments: 6074-1.patch When I run TestHLog in a loop, I see failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6074) TestHLog is flaky
[ https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6074: --- Attachment: (was: TestHLog.patch.txt) TestHLog is flaky - Key: HBASE-6074 URL: https://issues.apache.org/jira/browse/HBASE-6074 Project: HBase Issue Type: Test Components: test Affects Versions: 0.92.0 Reporter: Devaraj Das Attachments: 6074-1.patch When I run TestHLog in a loop, I see failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Updated] (HBASE-6085) SaslServer intermittently ignoring SaslClient's requests
Anything of note appear in the server side logs at DEBUG level? Have you tried duplicating this in an all-localhost configuration? If it is possible to reproduce in an all-localhost configuration, or on a cluster not otherwise occupied, then we can turn on additional SASL/GSSAPI level debugging that may shed light but will be quite verbose.
[jira] [Created] (HBASE-6090) JMX Registration Error while booting HMaster
Elliott Clark created HBASE-6090: Summary: JMX Registration Error while booting HMaster Key: HBASE-6090 URL: https://issues.apache.org/jira/browse/HBASE-6090 Project: HBase Issue Type: Bug Reporter: Elliott Clark When booting master there are errors about HMaster not being a bean and being unable to turn ServerLoad into an open class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui
[ https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282780#comment-13282780 ] Elliott Clark commented on HBASE-6084: -- So the errors on console are pretty un-related. so I filed HBASE-6090. Server Load does not display correctly on the ui Key: HBASE-6084 URL: https://issues.apache.org/jira/browse/HBASE-6084 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6084-0.patch The ui uses the toString method and toString does not implement it any more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6090) JMX Registration Error while booting HMaster
[ https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark reassigned HBASE-6090: Assignee: Elliott Clark JMX Registration Error while booting HMaster Key: HBASE-6090 URL: https://issues.apache.org/jira/browse/HBASE-6090 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark When booting master there are errors about HMaster not being a bean and being unable to turn ServerLoad into an open class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui
[ https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282784#comment-13282784 ] Gregory Chanan commented on HBASE-6084: --- The JMX issues are tracked in HBASE-5967. Is this a duplicate? Or are you only talking about fixing toString here and the JMX issues are separate? I like your idea about copying the format of the old one. Server Load does not display correctly on the ui Key: HBASE-6084 URL: https://issues.apache.org/jira/browse/HBASE-6084 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6084-0.patch The ui uses the toString method and toString does not implement it any more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6090) JMX Registration Error while booting HMaster
[ https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282785#comment-13282785 ] Gregory Chanan commented on HBASE-6090: --- Duplicate of HBASE-5967? JMX Registration Error while booting HMaster Key: HBASE-6090 URL: https://issues.apache.org/jira/browse/HBASE-6090 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark When booting master there are errors about HMaster not being a bean and being unable to turn ServerLoad into an open class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.
[ https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HBASE-5892: --- Attachment: hbase-5892.patch [hbck] Refactor parallel WorkItem* to Futures. -- Key: HBASE-5892 URL: https://issues.apache.org/jira/browse/HBASE-5892 Project: HBase Issue Type: Improvement Reporter: Jonathan Hsieh Labels: noob Attachments: hbase-5892.patch This would convert WorkItem* logic (with low level notifies, and rough exception handling) into a more canonical Futures pattern. Currently there are two instances of this pattern (for loading hdfs dirs, for contacting regionservers for assignments, and soon -- for loading hdfs .regioninfo files). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.
[ https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HBASE-5892: --- Status: Patch Available (was: Open) [hbck] Refactor parallel WorkItem* to Futures. -- Key: HBASE-5892 URL: https://issues.apache.org/jira/browse/HBASE-5892 Project: HBase Issue Type: Improvement Reporter: Jonathan Hsieh Labels: noob Attachments: hbase-5892.patch This would convert WorkItem* logic (with low level notifies, and rough exception handling) into a more canonical Futures pattern. Currently there are two instances of this pattern (for loading hdfs dirs, for contacting regionservers for assignments, and soon -- for loading hdfs .regioninfo files). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6090) JMX Registration Error while booting HMaster
[ https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282791#comment-13282791 ] Elliott Clark commented on HBASE-6090: -- Yep this is a dupe of 5976. JMX Registration Error while booting HMaster Key: HBASE-6090 URL: https://issues.apache.org/jira/browse/HBASE-6090 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark When booting master there are errors about HMaster not being a bean and being unable to turn ServerLoad into an open class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.
[ https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282790#comment-13282790 ] Andrew Wang commented on HBASE-5892: I tried to do this refactor, essentially switching out Runnable for Callable and adding some more logging in the process. Let me know if it's not what you were thinking of. I didn't do any testing beyond running hbck on my local machine, which seemed to work. [hbck] Refactor parallel WorkItem* to Futures. -- Key: HBASE-5892 URL: https://issues.apache.org/jira/browse/HBASE-5892 Project: HBase Issue Type: Improvement Reporter: Jonathan Hsieh Labels: noob Attachments: hbase-5892.patch This would convert WorkItem* logic (with low level notifies, and rough exception handling) into a more canonical Futures pattern. Currently there are two instances of this pattern (for loading hdfs dirs, for contacting regionservers for assignments, and soon -- for loading hdfs .regioninfo files). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated HBASE-6071: Status: Open (was: Patch Available) I didn't know that patches has to be submitted against trunk first. And also, I didn't know that diff's has to be created with --no-prefix getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Fix For: 0.90.7 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6090) JMX Registration Error while booting HMaster
[ https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark resolved HBASE-6090. -- Resolution: Duplicate Dupe of HBASE-5967 JMX Registration Error while booting HMaster Key: HBASE-6090 URL: https://issues.apache.org/jira/browse/HBASE-6090 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark When booting master there are errors about HMaster not being a bean and being unable to turn ServerLoad into an open class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated HBASE-6071: Affects Version/s: 0.92.0 0.94.0 Fix Version/s: (was: 0.90.7) getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.0, 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui
[ https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282802#comment-13282802 ] Elliott Clark commented on HBASE-6084: -- This was only for the UI which required to string. Server Load does not display correctly on the ui Key: HBASE-6084 URL: https://issues.apache.org/jira/browse/HBASE-6084 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6084-0.patch The ui uses the toString method and toString does not implement it any more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6084) Server Load does not display correctly on the ui
[ https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6084: - Attachment: HBASE-6084-1.patch This patch fixes the ui and adds getters for the values that used to be there. The totals are computed in ServerLoad from the totals of RegionLoad's Server Load does not display correctly on the ui Key: HBASE-6084 URL: https://issues.apache.org/jira/browse/HBASE-6084 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch The ui uses the toString method and toString does not implement it any more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282808#comment-13282808 ] Zhihong Yu commented on HBASE-5916: --- The failure in TestClockSkewDetection was due to NPE. The following change makes it pass: {code} if ((this.services == null || ((HMaster) this.services).isInitialized()) this.deadservers.cleanPreviousInstance(serverName)) { {code} {code} + * To clear any dead server with same host name and port of online server {code} I think 'any' should be added in front of 'online server'. {code} + public void clearDeadServersWithSameHostNameAndPortOfOnlineServer() { {code} The above method can be package private, right ? {code} + while ((sn = ServerName.findServerWithSameHostnamePort(this.deadservers, serverName)) != null) { {code} The above line exceeds 100 chars. {code} + if(actualDeadServers.contains(deadServer.getKey())){ {code} Add spaces after if and before {. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui
[ https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282812#comment-13282812 ] Gregory Chanan commented on HBASE-6084: --- Elliot, I'm a bit confused. When I look at 0.92.1's HServerLoad.toString I see: {code} int numberOfRegions = this.regionLoad.size(); StringBuilder sb = new StringBuilder(); sb = Strings.appendKeyValue(sb, requestsPerSecond, Integer.valueOf(numberOfRequests/msgInterval)); sb = Strings.appendKeyValue(sb, numberOfOnlineRegions, Integer.valueOf(numberOfRegions)); sb = Strings.appendKeyValue(sb, usedHeapMB, Integer.valueOf(this.usedHeapMB)); sb = Strings.appendKeyValue(sb, maxHeapMB, Integer.valueOf(maxHeapMB)); return sb.toString(); {code} But your toString doesn't match. It looks like you implemented HServerLoad.RegionLoad's toString in ServerLoad? Server Load does not display correctly on the ui Key: HBASE-6084 URL: https://issues.apache.org/jira/browse/HBASE-6084 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch The ui uses the toString method and toString does not implement it any more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui
[ https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282814#comment-13282814 ] Gregory Chanan commented on HBASE-6084: --- I think what we need to do is the following: 1) Write a ServerLoad.toString that matches HServerLoad.toString. 2) Implement a RegionLoad (not HServerLoad.RegionLoad) that wraps the protobuf RegionLoad, like how ServerLoad wraps the protobuf ServerLoad 3) Write a RegionLoad.toString that matches HServerLoad.RegionLoad.toString Does that seem correct to you or am I missing something? You should be able to do #1 now. I'm almost done with #2. you can track it in HBASE-5933. After #2, you should be able to do #3. Server Load does not display correctly on the ui Key: HBASE-6084 URL: https://issues.apache.org/jira/browse/HBASE-6084 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch The ui uses the toString method and toString does not implement it any more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282818#comment-13282818 ] Zhihong Yu commented on HBASE-6070: --- +1 on patch v2. You may want to verify that the failed test below wasn't related to this change: https://builds.apache.org/job/PreCommit-HBASE-Build/1987/console AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282821#comment-13282821 ] Zhihong Yu commented on HBASE-6071: --- --no-prefix is not required now - Hadoop QA is smart. getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.0, 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5352) ACL improvements
[ https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282823#comment-13282823 ] Matteo Bertozzi commented on HBASE-5352: @Laxman yeah create a sub-task for that, ACL can be not in sync with new features, so fill free to open a new sub-task to sync the coprocessor with the missing stuff. ACL improvements Key: HBASE-5352 URL: https://issues.apache.org/jira/browse/HBASE-5352 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.92.1, 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar In this issue I would like to open discussion for a few minor ACL related improvements. The proposed changes are as follows: 1. Introduce something like AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so that clients can check access rights before carrying out the operations. We need this kind of operation for HCATALOG-245, which introduces authorization providers for hbase over hcat. We cannot use getUserPermissions() since it requires ADMIN permissions on the global/table level. 2. getUserPermissions(tableName)/grant/revoke and drop/modify table operations should not check for global CREATE/ADMIN rights, but table CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read from a table, she should be able to read the table's permissions. We can choose whether we want only READ or ADMIN permissions for getUserPermission(). Since we check for global permissions first for table permissions, configuring table access using global permissions will continue to work. 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness) From all 3, we may want to backport the first one to 0.92 since without it, Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. I will create subissues and convert HBASE-5342 to a subtask when we get some feedback, and opinions for going further. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6084) Server Load does not display correctly on the ui
[ https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6084: - Attachment: HBASE-6084-2.patch I just added the extra info since we have them. In doing that I forgot to add the old stuff back in. Fixed. Server Load does not display correctly on the ui Key: HBASE-6084 URL: https://issues.apache.org/jira/browse/HBASE-6084 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch, HBASE-6084-2.patch The ui uses the toString method and toString does not implement it any more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282882#comment-13282882 ] chunhui shen commented on HBASE-5916: - @ram bq.In joinCluster(), as per the existing code if any new server has checked in and the root/meta had got assigned to it in joincluster we may think that it is an dead server because we alerady have passed the online servers. If we consider it as a dead server, what error will be caused? I think no error. Because, it must be a new regionserver (which is restarted right now), there is no regions carried by it. Of course, we won't assign region to it, but I think it is nothing. Correct me if wrong, thanks RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated HBASE-6071: Attachment: HBASE-6071.patch getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.0, 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HBASE-6071.patch, HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6091) Come up with strawman proposal for RC testing matrix
David S. Wang created HBASE-6091: Summary: Come up with strawman proposal for RC testing matrix Key: HBASE-6091 URL: https://issues.apache.org/jira/browse/HBASE-6091 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.96.0 Reporter: David S. Wang Assignee: David S. Wang -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-5916: -- Attachment: HBASE-5916_trunk_v7.patch RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282981#comment-13282981 ] rajeshbabu commented on HBASE-5916: --- @Zhihong Yu Thanks for help. In latest patch addressed Zhihong Yu comments. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-5916: -- Status: Open (was: Patch Available) RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-5916: -- Status: Patch Available (was: Open) RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-6032: -- Attachment: 6032-ports-5987.txt Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Zhihong Yu Attachments: 6032-ports-5987.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6074) TestHLog is flaky
[ https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282989#comment-13282989 ] Zhihong Yu commented on HBASE-6074: --- I experienced similar issue when I worked on HBASE-5699. I think separating some tests out into their own class(es) is one solution. TestHLog is flaky - Key: HBASE-6074 URL: https://issues.apache.org/jira/browse/HBASE-6074 Project: HBase Issue Type: Test Components: test Affects Versions: 0.92.0 Reporter: Devaraj Das Attachments: 6074-1.patch When I run TestHLog in a loop, I see failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-6032: -- Status: Patch Available (was: Open) Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Zhihong Yu Attachments: 6032-ports-5987.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5916: Attachment: HBASE-5916v8.patch I have make a simple patch(v8) with my above mentioned solution @ram Could you test it with your test case. Maybe something wrong, thanks for the reivew. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: [jira] [Updated] (HBASE-6085) SaslServer intermittently ignoring SaslClient's requests
Hi Andrew Did you intend to send this mail to me? Or to Himanshu? Regards Ram -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Friday, May 25, 2012 12:21 AM To: ramkrishna.s.vasudevan (JIRA) Cc: issues@hbase.apache.org Subject: Re: [jira] [Updated] (HBASE-6085) SaslServer intermittently ignoring SaslClient's requests Anything of note appear in the server side logs at DEBUG level? Have you tried duplicating this in an all-localhost configuration? If it is possible to reproduce in an all-localhost configuration, or on a cluster not otherwise occupied, then we can turn on additional SASL/GSSAPI level debugging that may shed light but will be quite verbose.
[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283098#comment-13283098 ] Hadoop QA commented on HBASE-6032: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529563/6032-ports-5987.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 36 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1992//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1992//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1992//console This message is automatically generated. Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Zhihong Yu Attachments: 6032-ports-5987.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
[ https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283105#comment-13283105 ] Hadoop QA commented on HBASE-6071: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529461/HBASE-6071.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1991//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1991//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1991//console This message is automatically generated. getRegionServerWithRetires, should log unsuccessful attempts and exceptions. Key: HBASE-6071 URL: https://issues.apache.org/jira/browse/HBASE-6071 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.92.0, 0.94.0 Reporter: Igal Shilman Priority: Minor Labels: client, ipc Attachments: HBASE-6071.patch, HConnectionManager_HBASE-6071-0.90.0.patch HConnectionImplementation.getRegionServerWithRetries might terminate w/ an exception different then a DoNotRetryIOException, thus silently drops exceptions from previous attempts. [~ted_yu] suggested ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E]) adding a log message inside the catch block describing the exception type and details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283109#comment-13283109 ] Hadoop QA commented on HBASE-5916: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529660/HBASE-5916v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 34 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1993//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1993//console This message is automatically generated. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283110#comment-13283110 ] ramkrishna.s.vasudevan commented on HBASE-5916: --- First of all thanks for your time in preparing a patch. I think if we don't get the new online servers in joincluster there is one problem {code} STEP 1: this.serverManager.expireDeadNotExpiredServers(); // Update meta with new HRI if required. i.e migrate all HRI with HTD to // HRI with out HTD in meta and update the status in ROOT. This must happen // before we assign all user regions or else the assignment will fail. // TODO: Remove this when we do 0.94. STEP 2:org.apache.hadoop.hbase.catalog.MetaMigrationRemovingHTD. updateMetaWithNewHRI(this); // Fixup assignment manager status status.setStatus(Starting assignment manager); STEP 3:this.assignmentManager.joinCluster(onlineServers); {code} I will tell you one scenario, may be its too rare but still possible I have 3 RS at STEP 1. one of them goes down and the SSH processes and tries to assign the regions. Before the assignment is done one new RS comes up before STEP 3. There is a small chance that the regions from dead RS are assigned to this new RS. Now in step 3 as we have already got the online servers list we may end up in thinking the new RS as an offline server after scanning META. Pls do correct me. Its a corner case. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5352) ACL improvements
[ https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283116#comment-13283116 ] Laxman commented on HBASE-5352: --- Yes Matt, there are many other apis which are not checked for authorization in AccessController. We may need to analyze all together once and handle them in phases. I will try to provide analysis of all the operations. We will discuss after that. Thanks for your quick response. ACL improvements Key: HBASE-5352 URL: https://issues.apache.org/jira/browse/HBASE-5352 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.92.1, 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar In this issue I would like to open discussion for a few minor ACL related improvements. The proposed changes are as follows: 1. Introduce something like AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so that clients can check access rights before carrying out the operations. We need this kind of operation for HCATALOG-245, which introduces authorization providers for hbase over hcat. We cannot use getUserPermissions() since it requires ADMIN permissions on the global/table level. 2. getUserPermissions(tableName)/grant/revoke and drop/modify table operations should not check for global CREATE/ADMIN rights, but table CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read from a table, she should be able to read the table's permissions. We can choose whether we want only READ or ADMIN permissions for getUserPermission(). Since we check for global permissions first for table permissions, configuring table access using global permissions will continue to work. 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness) From all 3, we may want to backport the first one to 0.92 since without it, Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. I will create subissues and convert HBASE-5342 to a subtask when we get some feedback, and opinions for going further. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6092) Authorize flush, split operations in AccessController
Laxman created HBASE-6092: - Summary: Authorize flush, split operations in AccessController Key: HBASE-6092 URL: https://issues.apache.org/jira/browse/HBASE-6092 Project: HBase Issue Type: Sub-task Components: security Reporter: Laxman Assignee: Laxman Currently, some operations like flush and split are not checked for authorization in AccessController. With the current implementation any unauthorized client can trigger these operations on a table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4676) Prefix Compression - Trie data block encoding
[ https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-4676: - Assignee: Matt Corgan Prefix Compression - Trie data block encoding - Key: HBASE-4676 URL: https://issues.apache.org/jira/browse/HBASE-4676 Project: HBase Issue Type: New Feature Components: io, performance, regionserver Affects Versions: 0.90.6 Reporter: Matt Corgan Assignee: Matt Corgan Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, hbase-prefix-trie-0.1.jar The HBase data block format has room for 2 significant improvements for applications that have high block cache hit ratios. First, there is no prefix compression, and the current KeyValue format is somewhat metadata heavy, so there can be tremendous memory bloat for many common data layouts, specifically those with long keys and short values. Second, there is no random access to KeyValues inside data blocks. This means that every time you double the datablock size, average seek time (or average cpu consumption) goes up by a factor of 2. The standard 64KB block size is ~10x slower for random seeks than a 4KB block size, but block sizes as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB or more may be more efficient from a disk access and block-cache perspective in many big-data applications, but doing so is infeasible from a random seek perspective. The PrefixTrie block encoding format attempts to solve both of these problems. Some features: * trie format for row key encoding completely eliminates duplicate row keys and encodes similar row keys into a standard trie structure which also saves a lot of space * the column family is currently stored once at the beginning of each block. this could easily be modified to allow multiple family names per block * all qualifiers in the block are stored in their own trie format which caters nicely to wide rows. duplicate qualifers between rows are eliminated. the size of this trie determines the width of the block's qualifier fixed-width-int * the minimum timestamp is stored at the beginning of the block, and deltas are calculated from that. the maximum delta determines the width of the block's timestamp fixed-width-int The block is structured with metadata at the beginning, then a section for the row trie, then the column trie, then the timestamp deltas, and then then all the values. Most work is done in the row trie, where every leaf node (corresponding to a row) contains a list of offsets/references corresponding to the cells in that row. Each cell is fixed-width to enable binary searching and is represented by [1 byte operationType, X bytes qualifier offset, X bytes timestamp delta offset]. If all operation types are the same for a block, there will be zero per-cell overhead. Same for timestamps. Same for qualifiers when i get a chance. So, the compression aspect is very strong, but makes a few small sacrifices on VarInt size to enable faster binary searches in trie fan-out nodes. A more compressed but slower version might build on this by also applying further (suffix, etc) compression on the trie nodes at the cost of slower write speed. Even further compression could be obtained by using all VInts instead of FInts with a sacrifice on random seek speed (though not huge). One current drawback is the current write speed. While programmed with good constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not programmed with the same level of optimization as the read path. Work will need to be done to optimize the data structures used for encoding and could probably show a 10x increase. It will still be slower than delta encoding, but with a much higher decode speed. I have not yet created a thorough benchmark for write speed nor sequential read speed. Though the trie is reaching a point where it is internally very efficient (probably within half or a quarter of its max read speed) the way that hbase currently uses it is far from optimal. The KeyValueScanner and related classes that iterate through the trie will eventually need to be smarter and have methods to do things like skipping to the next row of results without scanning every cell in between. When that is accomplished it will also allow much faster compactions because the full row key will not have to be compared as often as it is now. Current code is on github. The trie code is in a separate project than the slightly modified hbase. There is an hbase project there as well with the DeltaEncoding patch applied, and it builds on top of that.
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283121#comment-13283121 ] ramkrishna.s.vasudevan commented on HBASE-6070: --- @Ted TestServerCustomProtocol.testSingleMethod() passes with the patch. I saw that even in someother precommit build the same has failed. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/ AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu reassigned HBASE-6032: - Assignee: Zhihong Yu Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Zhihong Yu Assignee: Zhihong Yu Fix For: 0.96.0 Attachments: 6032-ports-5987.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-6032: -- Fix Version/s: 0.96.0 Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Zhihong Yu Fix For: 0.96.0 Attachments: 6032-ports-5987.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987
[ https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283122#comment-13283122 ] Zhihong Yu commented on HBASE-6032: --- Can someone review the port please ? Port HFileBlockIndex improvement from HBASE-5987 Key: HBASE-6032 URL: https://issues.apache.org/jira/browse/HBASE-6032 Project: HBase Issue Type: Task Reporter: Zhihong Yu Assignee: Zhihong Yu Fix For: 0.96.0 Attachments: 6032-ports-5987.txt Excerpt from HBASE-5987: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. This JIRA is to port the fix to HBase trunk, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283123#comment-13283123 ] Zhihong Yu commented on HBASE-6070: --- All right. AM.nodeDeleted and SSH races creating problems for regions under SPLIT -- Key: HBASE-6070 URL: https://issues.apache.org/jira/browse/HBASE-6070 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806. While doing some more we found still there is one race condition. - Split has just started and the znode is in RS_SPLIT state. - RS goes down. - First call back for SSH comes. - As part of the fix for HBASE-5806 SSH knows that some region is in RIT. - But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT. - After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned. When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5352) ACL improvements
[ https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283124#comment-13283124 ] Andrew Purtell commented on HBASE-5352: --- Originally only the superuser could take such actions, so the AccessController did not need to deal with them. Now that the implementation is changing all of these cases need review. I suggest sub issues for each RPC interface. ACL improvements Key: HBASE-5352 URL: https://issues.apache.org/jira/browse/HBASE-5352 Project: HBase Issue Type: Improvement Components: security Affects Versions: 0.92.1, 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar In this issue I would like to open discussion for a few minor ACL related improvements. The proposed changes are as follows: 1. Introduce something like AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so that clients can check access rights before carrying out the operations. We need this kind of operation for HCATALOG-245, which introduces authorization providers for hbase over hcat. We cannot use getUserPermissions() since it requires ADMIN permissions on the global/table level. 2. getUserPermissions(tableName)/grant/revoke and drop/modify table operations should not check for global CREATE/ADMIN rights, but table CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read from a table, she should be able to read the table's permissions. We can choose whether we want only READ or ADMIN permissions for getUserPermission(). Since we check for global permissions first for table permissions, configuring table access using global permissions will continue to work. 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness) From all 3, we may want to backport the first one to 0.92 since without it, Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. I will create subissues and convert HBASE-5342 to a subtask when we get some feedback, and opinions for going further. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6086) Admin operations on a table should be authorized against table permissions instead of global permissions.
[ https://issues.apache.org/jira/browse/HBASE-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-6086. --- Resolution: Duplicate Admin operations on a table should be authorized against table permissions instead of global permissions. - Key: HBASE-6086 URL: https://issues.apache.org/jira/browse/HBASE-6086 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0 Reporter: Laxman Assignee: Laxman Labels: acl, security Still some inconsistency exists after HBASE-6061. We actually need to authorize against table permissions instead of global permissions here. {code} + private void requireTableAdminPermission(MasterCoprocessorEnvironment e, + byte[] tableName) throws IOException { +if (isActiveUserTableOwner(e, tableName)) { + requirePermission(Permission.Action.CREATE); +} else { + requirePermission(Permission.Action.ADMIN); +} + } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283130#comment-13283130 ] chunhui shen commented on HBASE-5916: - @ram The above you mentioned is a good case. However, I find the current master logic when startup is more and more complicated. What about do the following in the process of SSH: {code} ... if (isCarryingRoot()){} if (isCarryingMeta()) {} if (isCarryingRoot() || isCarryingMeta()) {} int waitedTimeForMasterInitialized = 0; while (!server.isStopped() !services.isInitialized()) { try { if (waitedTimeForMasterInitialized == 0) { LOG.info(Master is not initialized, waiting...); } Thread.sleep(100); waitedTimeForMasterInitialized += 100; } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new IOException(Interrupted, e); } } if (waitedTimeForMasterInitialized 0) { LOG.info(Recovery time calculation: waiting on master to be initialized took + waitedTimeForMasterInitialized + ms); } {code} I think we could make SSH wait until master initialized after it assigned META region, thus we could skip considering many troublesome concurrent case . RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5993) Add a no-read Append
[ https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283144#comment-13283144 ] Lars Hofhansl commented on HBASE-5993: -- Then I do not understand what we are proposing here. An Append that does not read the existing value is a Put, no? Maybe a patch will make it clear to me. Add a no-read Append Key: HBASE-5993 URL: https://issues.apache.org/jira/browse/HBASE-5993 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.0 Reporter: Jacques Priority: Critical HBASE-4102 added an atomic append. For high performance situations, it would be helpful to be able to do appends that don't actually require a read of the existing value. This would be useful in building a growing set of values. Our original use case was for implementing a form of search in HBase where a cell would contain a list of document ids associated with a particular keyword for search. However it seems like it would also be useful to provide substantial performance improvements for most Append scenarios. Within the client API, the simplest way to implement this would be to leverage the existing Append api. If the Append is marked as setReturnResults(false), use this code path. If result return is requested, use the existing Append implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira