[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction

2012-05-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282235#comment-13282235
 ] 

Hudson commented on HBASE-6033:
---

Integrated in HBase-TRUNK #2920 (See 
[https://builds.apache.org/job/HBase-TRUNK/2920/])
HBASE-6033 Adding some fuction to check if a table/region is in compaction 
(Jimmy) (Revision 1342149)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
* /hbase/trunk/src/main/protobuf/Admin.proto
* /hbase/trunk/src/main/resources/hbase-webapps/master/table.jsp
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionState.java
* /hbase/trunk/src/test/resources/hbase-site.xml


 Adding some fuction to check if a table/region is in compaction
 ---

 Key: HBASE-6033
 URL: https://issues.apache.org/jira/browse/HBASE-6033
 Project: HBase
  Issue Type: New Feature
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: 6033-v7.txt, hbase-6033_v2.patch, hbase-6033_v3.patch, 
 hbase_6033_v5.patch, hbase_6033_v6.patch, table_ui.png


 This feature will be helpful to find out if a major compaction is going on.
 We can show if it is in any minor compaction too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-6070:
-

Assignee: ramkrishna.s.vasudevan

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.94.patch

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.92.patch

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_trunk.patch

Uploaded patches for all branches.  Tested in cluster including scenarios for 
HBASE-5806.  Pls review and provide your comments.

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, 
 HBASE-6070_trunk.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5352) ACL improvements

2012-05-24 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282382#comment-13282382
 ] 

Laxman commented on HBASE-5352:
---

Enis  Matt, hope you don't mind if I add some sub-tasks related to ACL here.
Already added HBASE-6086. Matt clarified this is a duplicate of HBASE-5372.

Also one more observation I wanted to validate with you.

Currently, AccessController doesn't provide implementation for some methods 
like preFlush, preSplit and many others. That means, any unauthorized user can 
trigger these operations on a table.

Do we need to handle this in a separate jira?

 ACL improvements
 

 Key: HBASE-5352
 URL: https://issues.apache.org/jira/browse/HBASE-5352
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 In this issue I would like to open discussion for a few minor ACL related 
 improvements. The proposed changes are as follows: 
 1. Introduce something like 
 AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
 that clients can check access rights before carrying out the operations. We 
 need this kind of operation for HCATALOG-245, which introduces authorization 
 providers for hbase over hcat. We cannot use getUserPermissions() since it 
 requires ADMIN permissions on the global/table level.
 2. getUserPermissions(tableName)/grant/revoke and drop/modify table 
 operations should not check for global CREATE/ADMIN rights, but table 
 CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read 
 from a table, she should be able to read the table's permissions. We can 
 choose whether we want only READ or ADMIN permissions for 
 getUserPermission(). Since we check for global permissions first for table 
 permissions, configuring table access using global permissions will continue 
 to work.  
 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness)
 From all 3, we may want to backport the first one to 0.92 since without it, 
 Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. 
 I will create subissues and convert HBASE-5342 to a subtask when we get some 
 feedback, and opinions for going further. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Igal Shilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Attachment: HConnectionManager_HBASE-6071-0.90.0.patch

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.4
Reporter: Igal Shilman
Priority: Minor
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Status: Patch Available  (was: Open)

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, 
 HBASE-6070_trunk.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Igal Shilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Affects Version/s: (was: 0.90.4)
   0.90.0

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v5.patch

Fixed issues with limits in next() call.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Status: Patch Available  (was: Open)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282497#comment-13282497
 ] 

Max Lapan commented on HBASE-5416:
--

After a long delay, I decided to return to this optimization.
We have this patch on our production system (300TB HBase data, 160 nodes) 
during last two months without issues. 2-phase approach tests demonstrated much 
worse performance improvement over this patch - only 2 times speedup vs near 20 
times.

I extended tests, but don't feel myself experienced enougth to implement 
concurrent, multithread test as suggested, sorry. 

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282495#comment-13282495
 ] 

Hadoop QA commented on HBASE-5416:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12529061/Filtered_scans_v5.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1982//console

This message is automatically generated.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Igal Shilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Fix Version/s: 0.90.7
   Labels: client ipc  (was: )
   Status: Patch Available  (was: Open)

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Fix For: 0.90.7

 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node

2012-05-24 Thread Gopinathan A (JIRA)
Gopinathan A created HBASE-6088:
---

 Summary:  Region splitting not happened for long time due to ZK 
exception while creating RS_ZK_SPLITTING node
 Key: HBASE-6088
 URL: https://issues.apache.org/jira/browse/HBASE-6088
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Gopinathan A
 Fix For: 0.94.1


Region splitting not happened for long time due to ZK exception while creating 
RS_ZK_SPLITTING node

{noformat}
2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 26668ms for sessionid 
0x1377a75f41d0012, closing socket connection and attempting reconnect
2012-05-24 01:45:41,464 WARN 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
ZooKeeper exception: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
{noformat}

{noformat}
2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
cleanupCurrentWriter  waiting for transactions to get synced  total 189377 
synced till here 189365
2012-05-24 01:45:48,474 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: 
Running rollback/cleanup of failed split of 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
setting SPLITTING znode on 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
java.io.IOException: Failed setting SPLITTING znode on 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
KeeperErrorCode = BadVersion for 
/hbase/unassigned/bd1079bf948c672e493432020dc0e144
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
... 5 more
2012-05-24 01:45:48,476 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: 
Successful rollback of failed split of 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
{noformat}


{noformat}
2012-05-24 01:47:28,141 ERROR 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
/hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is 
not a retry
2012-05-24 01:47:28,142 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: 
Running rollback/cleanup of failed split of 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
java.io.IOException: Failed create of ephemeral 
/hbase/unassigned/bd1079bf948c672e493432020dc0e144
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
{noformat}

Due to the above exception, region splitting was failing contineously more than 
5hrs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282504#comment-13282504
 ] 

Hadoop QA commented on HBASE-6071:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12528963/HConnectionManager_HBASE-6071-0.90.0.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1983//console

This message is automatically generated.

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Fix For: 0.90.7

 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Status: Open  (was: Patch Available)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Attachment: (was: Filtered_scans_v5.patch)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v5.patch

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282516#comment-13282516
 ] 

Zhihong Yu commented on HBASE-6070:
---

{code}
+// but the RS had went down before completing the split process 
then will not try to
{code}
'had went down' - 'had gone down'
{code}
+  if(response == null) return null;
{code}
Space after 'if'
{code}
+  static Result getMetaTableRowResultAsSplittedRegion(final HRegionInfo hri, 
final ServerName sn)
{code}
The method should be called getMetaTableRowResultAsSplitRegion().

Should investigate the test failure in TestFromClientSide

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, 
 HBASE-6070_trunk.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282518#comment-13282518
 ] 

ramkrishna.s.vasudevan commented on HBASE-6088:
---

While we start doing the split, there are two steps in zk node creation.
- Create the node
- Write the data RS_ZK_SPLITTING into it.
Now after both the steps are completed we make an journal entry.  
Now if writing the data fails even on rollback we are not able to clean the 
node as we don't know the current journal entry.  

  Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 

 Key: HBASE-6088
 URL: https://issues.apache.org/jira/browse/HBASE-6088
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Gopinathan A
 Fix For: 0.94.1


 Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 {noformat}
 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session 
 timed out, have not heard from server in 26668ms for sessionid 
 0x1377a75f41d0012, closing socket connection and attempting reconnect
 2012-05-24 01:45:41,464 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
 ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 {noformat}
 {noformat}
 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
 cleanupCurrentWriter  waiting for transactions to get synced  total 189377 
 synced till here 189365
 2012-05-24 01:45:48,474 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 java.io.IOException: Failed setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
   at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
 KeeperErrorCode = BadVersion for 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
   ... 5 more
 2012-05-24 01:45:48,476 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of 
 failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 {noformat}
 {noformat}
 2012-05-24 01:47:28,141 ERROR 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is 
 not a retry
 2012-05-24 01:47:28,142 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 java.io.IOException: Failed create of ephemeral 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
   at 
 

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282520#comment-13282520
 ] 

Zhihong Yu commented on HBASE-5416:
---

@Max:
The new patch is much larger than previous version. Can you provide more 
detailed description on the change ?

Thanks

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6068) Secure HBase cluster : Client not able to call some admin APIs

2012-05-24 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi reassigned HBASE-6068:
--

Assignee: Matteo Bertozzi

 Secure HBase cluster : Client not able to call some admin APIs
 --

 Key: HBASE-6068
 URL: https://issues.apache.org/jira/browse/HBASE-6068
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.94.0
Reporter: Anoop Sam John
Assignee: Matteo Bertozzi

 In case of secure cluster, we allow the HBase clients to read the zk nodes by 
 providing the global read permissions to all for certain nodes. These nodes 
 are the master address znode, root server znode and the clusterId znode. In 
 ZKUtil.createACL() , we can see these node names are specially handled.
 But there are some other client side admin APIs which makes a read call into 
 the zookeeper from the client. This include the isTableEnabled() call (May be 
 some other. I have seen this).  Here the client directly reads a node in the 
 zookeeper ( node created for this table ) and the data is matched to know 
 whether this is enabled or not.
 Now in secure cluster case any client can read zookeeper nodes which it needs 
 for its normal operation like the master address and root server address.  
 But what if the client calls this API? [isTableEnaled () ].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v5.patch

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Attachment: (was: Filtered_scans_v5.patch)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-5416:
-

Status: Patch Available  (was: Open)

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282528#comment-13282528
 ] 

Max Lapan commented on HBASE-5416:
--

Additional code handled the case when InternalScanner::next called with limit 
!= -1. In this case, we must remember KeyValueHeap we populated when limit 
reached, and restart this population on next method issue.

I also added a test case for such situation.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Status: Open  (was: Patch Available)

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_trunk.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.92_1.patch

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_trunk.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.94_1.patch

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, 
 HBASE-6070_trunk_1.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Status: Patch Available  (was: Open)

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, 
 HBASE-6070_trunk_1.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_trunk_1.patch

Updated patches fixing the comments.  I tried running the failed testcase.  It 
passed every time.

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, 
 HBASE-6070_trunk_1.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282534#comment-13282534
 ] 

Zhihong Yu commented on HBASE-5416:
---

Will go over the patch when I get into office.

It would be nice to use https://reviews.apache.org to facilitate reviews.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282539#comment-13282539
 ] 

Max Lapan commented on HBASE-5416:
--

I tried to post it there, but constantly get Internal server error.

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: (was: HBASE-6070_trunk_1.patch)

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_trunk_1.patch

Just reattaching the patch.

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, 
 HBASE-6070_trunk_1.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

2012-05-24 Thread Max Lapan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282543#comment-13282543
 ] 

Max Lapan commented on HBASE-5416:
--

Ahhh, I'm stupid, it works with hbase-git repository. Posted 
https://reviews.apache.org/r/5225/

 Improve performance of scans with some kind of filters.
 ---

 Key: HBASE-5416
 URL: https://issues.apache.org/jira/browse/HBASE-5416
 Project: HBase
  Issue Type: Improvement
  Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
 Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch, 
 Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch, 
 Filtered_scans_v5.patch


 When the scan is performed, whole row is loaded into result list, after that 
 filter (if exists) is applied to detect that row is needed.
 But when scan is performed on several CFs and filter checks only data from 
 the subset of these CFs, data from CFs, not checked by a filter is not needed 
 on a filter stage. Only when we decided to include current row. And in such 
 case we can significantly reduce amount of IO performed by a scan, by loading 
 only values, actually checked by a filter.
 For example, we have two CFs: flags and snap. Flags is quite small (bunch of 
 megabytes) and is used to filter large entries from snap. Snap is very large 
 (10s of GB) and it is quite costly to scan it. If we needed only rows with 
 some flag specified, we use SingleColumnValueFilter to limit result to only 
 small subset of region. But current implementation is loading both CFs to 
 perform scan, when only small subset is needed.
 Attached patch adds one routine to Filter interface to allow filter to 
 specify which CF is needed to it's operation. In HRegion, we separate all 
 scanners into two groups: needed for filter and the rest (joined). When new 
 row is considered, only needed data is loaded, filter applied, and only if 
 filter accepts the row, rest of data is loaded. At our data, this speeds up 
 such kind of scans 30-50 times. Also, this gives us the way to better 
 normalize the data into separate columns by optimizing the scans performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282580#comment-13282580
 ] 

ramkrishna.s.vasudevan commented on HBASE-5916:
---

@Chunhui
The suggestion given above can simply be avoided by taking a the actual online 
servers list after getting the logFolders.  This will ensure that we donot 
split any new RS that has checked in.

In joinCluster(), as per the existing code if any new server has checked in and 
the root/meta had got assigned to it in joincluster we may think that it is an 
dead server because we alerady have passed the online servers.  Hence we are 
trying to get the actual online list as per the patch.

The problem that you have mentioned here
bq.if Regionserver A with startcode 001 is restarted, and then Regionserver A 
with startcode 002 is in the onlineServers, but Regionserver A with startcode 
001 is in the process by SSH, not in the deadServers

This we are trying to avoid in our current v6 patch, by not remvoing from dead 
servers, any restarted server that is coming up during master initialization. 
Later after master initialization we try to clear the dead server which matches 
with the current online servers with same host name and port.

There are other problems during SSH and master initialization that may lead to 
double assignment or concurrent modification exception.  These things we will 
address in a new JIRA.
Pls review the current patch and provide your suggestions.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Attachment: HBASE-5916_trunk_v6.patch

Attached patch. Please review and provide suggestions/comments

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-6089:
-

 Summary: SSH and AM.joinCluster causes Concurrent Modification 
exception.
 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1


AM.regions map is parallely accessed in SSH and Master initialization leading 
to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Open  (was: Patch Available)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Patch Available  (was: Open)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282582#comment-13282582
 ] 

ramkrishna.s.vasudevan commented on HBASE-6089:
---

{code}
2012-05-24 19:26:02,493 DEBUG org.apache.hadoop.hbase.master.ServerManager: New 
connection to linux146,60020,1337867810895
2012-05-24 19:26:02,552 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=2be5ef20db58b775953cc1107eb51d2d
2012-05-24 19:26:02,592 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=191b0c97f2d2a8262bf790093fdce2ab
2012-05-24 19:26:02,595 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=99d462b47ea5e301175d025204eff014
2012-05-24 19:26:03,957 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Bulk assigning done for linux146,60020,1337867810895
2012-05-24 19:26:14,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=2be5ef20db58b775953cc1107eb51d2d
2012-05-24 19:26:14,781 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1337867810895, 
region=2be5ef20db58b775953cc1107eb51d2d
2012-05-24 19:26:14,785 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for et1,,1337864575331.2be5ef20db58b775953cc1107eb51d2d. from 
linux146,60020,1337867810895; deleting unassigned node
2012-05-24 19:26:14,786 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x1377ea1a1fe002d Deleting existing unassigned node for 
2be5ef20db58b775953cc1107eb51d2d that is in expected state RS_ZK_REGION_OPENED
2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x1377ea1a1fe002d Successfully deleted unassigned node for region 
2be5ef20db58b775953cc1107eb51d2d in expected state RS_ZK_REGION_OPENED
2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=5a84a4f4eaf2519e36a8ccc2e9c83b04
2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
The znode of region et1,,1337864575331.2be5ef20db58b775953cc1107eb51d2d. has 
been deleted.
2012-05-24 19:26:23,862 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
processing of shutdown of linux146,60020,1337866620614
2012-05-24 19:26:51,927 FATAL org.apache.hadoop.hbase.master.HMaster: Master 
server abort: loaded coprocessors are: []
2012-05-24 19:26:51,931 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
at java.util.TreeMap$EntryIterator.next(TreeMap.java:1136)
at java.util.TreeMap$EntryIterator.next(TreeMap.java:1131)
at 
org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:409)
at 
org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:363)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:607)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:374)
at java.lang.Thread.run(Thread.java:662)
{code}


 SSH and AM.joinCluster causes Concurrent Modification exception.
 

 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1


 AM.regions map is parallely accessed in SSH and Master initialization leading 
 to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-6089:
-

Assignee: rajeshbabu

 SSH and AM.joinCluster causes Concurrent Modification exception.
 

 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1


 AM.regions map is parallely accessed in SSH and Master initialization leading 
 to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282589#comment-13282589
 ] 

Hadoop QA commented on HBASE-5916:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12529160/HBASE-5916_trunk_v6.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 34 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestClockSkewDetection

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1986//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1986//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1986//console

This message is automatically generated.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6074) TestHLog is flaky

2012-05-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: TestHLog.patch.txt

@Ted, yes, I saw the failures with hbase-0.92/hadoop-1.0.3. In my first trial, 
the org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd was failing 
intermittently. Here are the snippets from the log:

{noformat}
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)  at 
sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)  at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)  at 
java.security.AccessController.doPrivileged(Native Method)  at 
javax.security.auth.Subject.doAs(Subject.java:396)  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

Stacktrace

org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy8.complete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy8.complete(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at 
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1017)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:215)
at org.apache.hadoop.hbase.regionserver.wal.HLog.close(HLog.java:914)
at 
org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd(TestHLog.java:480)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

{noformat}

I consulted with an HDFS dev, and he thought that there might be a race 
condition with the shutting down of cluster in testAppendClose (the previous 
test in the 

[jira] [Updated] (HBASE-6074) TestHLog is flaky

2012-05-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: 6074-1.patch
6074-1.patch

@Ted, yes, I saw the failures with hbase-0.92/hadoop-1.0.3. In my first trial, 
the org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd was failing 
intermittently. Here are the snippets from the log:  

{noformat} 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)  at 
sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)  at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)  at 
java.security.AccessController.doPrivileged(Native Method)  at 
javax.security.auth.Subject.doAs(Subject.java:396)  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

Stacktrace 

org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) 

at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy8.complete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy8.complete(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at 
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1017)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:215)
at org.apache.hadoop.hbase.regionserver.wal.HLog.close(HLog.java:914)
at 
org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd(TestHLog.java:480)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
{noformat}

I consulted with an 

[jira] [Updated] (HBASE-6074) TestHLog is flaky

2012-05-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: (was: 6074-1.patch)

 TestHLog is flaky
 -

 Key: HBASE-6074
 URL: https://issues.apache.org/jira/browse/HBASE-6074
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.92.0
Reporter: Devaraj Das
 Attachments: 6074-1.patch


 When I run TestHLog in a loop, I see failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6074) TestHLog is flaky

2012-05-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: (was: TestHLog.patch.txt)

 TestHLog is flaky
 -

 Key: HBASE-6074
 URL: https://issues.apache.org/jira/browse/HBASE-6074
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.92.0
Reporter: Devaraj Das
 Attachments: 6074-1.patch


 When I run TestHLog in a loop, I see failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Updated] (HBASE-6085) SaslServer intermittently ignoring SaslClient's requests

2012-05-24 Thread Andrew Purtell
Anything of note appear in the server side logs at DEBUG level? Have
you tried duplicating this in an all-localhost configuration? If it is
possible to reproduce in an all-localhost configuration, or on a
cluster not otherwise occupied, then we can turn on additional
SASL/GSSAPI level debugging that may shed light but will be quite
verbose.


[jira] [Created] (HBASE-6090) JMX Registration Error while booting HMaster

2012-05-24 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-6090:


 Summary: JMX Registration Error while booting HMaster
 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark


When booting master there are errors about HMaster not being a bean and being 
unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui

2012-05-24 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282780#comment-13282780
 ] 

Elliott Clark commented on HBASE-6084:
--

So the errors on console are pretty un-related.  so I filed HBASE-6090.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6090) JMX Registration Error while booting HMaster

2012-05-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark reassigned HBASE-6090:


Assignee: Elliott Clark

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui

2012-05-24 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282784#comment-13282784
 ] 

Gregory Chanan commented on HBASE-6084:
---

The JMX issues are tracked in HBASE-5967.  Is this a duplicate?  Or are you 
only talking about fixing toString here and the JMX issues are separate?

I like your idea about copying the format of the old one.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6090) JMX Registration Error while booting HMaster

2012-05-24 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282785#comment-13282785
 ] 

Gregory Chanan commented on HBASE-6090:
---

Duplicate of HBASE-5967?

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HBASE-5892:
---

Attachment: hbase-5892.patch

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
  Labels: noob
 Attachments: hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HBASE-5892:
---

Status: Patch Available  (was: Open)

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
  Labels: noob
 Attachments: hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6090) JMX Registration Error while booting HMaster

2012-05-24 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282791#comment-13282791
 ] 

Elliott Clark commented on HBASE-6090:
--

Yep this is a dupe of 5976.

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-24 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282790#comment-13282790
 ] 

Andrew Wang commented on HBASE-5892:


I tried to do this refactor, essentially switching out Runnable for Callable 
and adding some more logging in the process. Let me know if it's not what you 
were thinking of.

I didn't do any testing beyond running hbck on my local machine, which seemed 
to work.

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
  Labels: noob
 Attachments: hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Igal Shilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Status: Open  (was: Patch Available)

I didn't know that patches has to be submitted against trunk first.
And also, I didn't know that diff's has to be created with --no-prefix 


 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Fix For: 0.90.7

 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-6090) JMX Registration Error while booting HMaster

2012-05-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark resolved HBASE-6090.
--

Resolution: Duplicate

Dupe of HBASE-5967

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Igal Shilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Affects Version/s: 0.92.0
   0.94.0
Fix Version/s: (was: 0.90.7)

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0, 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui

2012-05-24 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282802#comment-13282802
 ] 

Elliott Clark commented on HBASE-6084:
--

This was only for the UI which required to string.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6084) Server Load does not display correctly on the ui

2012-05-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6084:
-

Attachment: HBASE-6084-1.patch

This patch fixes the ui and adds getters for the values that used to be there.

The totals are computed in ServerLoad from the totals of RegionLoad's

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282808#comment-13282808
 ] 

Zhihong Yu commented on HBASE-5916:
---

The failure in TestClockSkewDetection was due to NPE.
The following change makes it pass:
{code}
if ((this.services == null || ((HMaster) this.services).isInitialized())
 this.deadservers.cleanPreviousInstance(serverName)) {
{code}

{code}
+   * To clear any dead server with same host name and port of online server
{code}
I think 'any' should be added in front of 'online server'.
{code}
+  public void clearDeadServersWithSameHostNameAndPortOfOnlineServer() {
{code}
The above method can be package private, right ?
{code}
+  while ((sn = ServerName.findServerWithSameHostnamePort(this.deadservers, 
serverName)) != null) {
{code}
The above line exceeds 100 chars.
{code}
+  if(actualDeadServers.contains(deadServer.getKey())){
{code}
Add spaces after if and before {.


 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui

2012-05-24 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282812#comment-13282812
 ] 

Gregory Chanan commented on HBASE-6084:
---

Elliot,

I'm a bit confused.  When I look at 0.92.1's HServerLoad.toString I see:
{code}
int numberOfRegions = this.regionLoad.size();
StringBuilder sb = new StringBuilder();
sb = Strings.appendKeyValue(sb, requestsPerSecond,
  Integer.valueOf(numberOfRequests/msgInterval));
sb = Strings.appendKeyValue(sb, numberOfOnlineRegions,
  Integer.valueOf(numberOfRegions));
sb = Strings.appendKeyValue(sb, usedHeapMB,
  Integer.valueOf(this.usedHeapMB));
sb = Strings.appendKeyValue(sb, maxHeapMB, Integer.valueOf(maxHeapMB));
return sb.toString();
{code}

But your toString doesn't match.  It looks like you implemented 
HServerLoad.RegionLoad's toString in ServerLoad?

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui

2012-05-24 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282814#comment-13282814
 ] 

Gregory Chanan commented on HBASE-6084:
---

I think what we need to do is the following:

1) Write a ServerLoad.toString that matches HServerLoad.toString.
2) Implement a RegionLoad (not HServerLoad.RegionLoad) that wraps the protobuf 
RegionLoad, like how ServerLoad wraps the protobuf ServerLoad
3) Write a RegionLoad.toString that matches HServerLoad.RegionLoad.toString

Does that seem correct to you or am I missing something?

You should be able to do #1 now.
I'm almost done with #2.  you can track it in HBASE-5933.
After #2, you should be able to do #3.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282818#comment-13282818
 ] 

Zhihong Yu commented on HBASE-6070:
---

+1 on patch v2.

You may want to verify that the failed test below wasn't related to this change:
https://builds.apache.org/job/PreCommit-HBASE-Build/1987/console

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, 
 HBASE-6070_trunk_1.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282821#comment-13282821
 ] 

Zhihong Yu commented on HBASE-6071:
---

--no-prefix is not required now - Hadoop QA is smart.

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0, 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5352) ACL improvements

2012-05-24 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282823#comment-13282823
 ] 

Matteo Bertozzi commented on HBASE-5352:


@Laxman yeah create a sub-task for that, ACL can be not in sync with new 
features, so fill free to open a new sub-task to sync the coprocessor with the 
missing stuff.

 ACL improvements
 

 Key: HBASE-5352
 URL: https://issues.apache.org/jira/browse/HBASE-5352
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 In this issue I would like to open discussion for a few minor ACL related 
 improvements. The proposed changes are as follows: 
 1. Introduce something like 
 AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
 that clients can check access rights before carrying out the operations. We 
 need this kind of operation for HCATALOG-245, which introduces authorization 
 providers for hbase over hcat. We cannot use getUserPermissions() since it 
 requires ADMIN permissions on the global/table level.
 2. getUserPermissions(tableName)/grant/revoke and drop/modify table 
 operations should not check for global CREATE/ADMIN rights, but table 
 CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read 
 from a table, she should be able to read the table's permissions. We can 
 choose whether we want only READ or ADMIN permissions for 
 getUserPermission(). Since we check for global permissions first for table 
 permissions, configuring table access using global permissions will continue 
 to work.  
 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness)
 From all 3, we may want to backport the first one to 0.92 since without it, 
 Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. 
 I will create subissues and convert HBASE-5342 to a subtask when we get some 
 feedback, and opinions for going further. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6084) Server Load does not display correctly on the ui

2012-05-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6084:
-

Attachment: HBASE-6084-2.patch

I just added the extra info since we have them.  In doing that I forgot to add 
the old stuff back in.

Fixed.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch, 
 HBASE-6084-2.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282882#comment-13282882
 ] 

chunhui shen commented on HBASE-5916:
-

@ram
bq.In joinCluster(), as per the existing code if any new server has checked in 
and the root/meta had got assigned to it in joincluster we may think that it is 
an dead server because we alerady have passed the online servers.

If we consider it as a dead server, what error will be caused? 
I think no error. Because, it must be a new regionserver (which is restarted 
right now), there is no regions carried by it. Of course, we won't assign 
region to it, but I think it is nothing.

Correct me if wrong, thanks 

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Igal Shilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Attachment: HBASE-6071.patch

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0, 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6091) Come up with strawman proposal for RC testing matrix

2012-05-24 Thread David S. Wang (JIRA)
David S. Wang created HBASE-6091:


 Summary: Come up with strawman proposal for RC testing matrix
 Key: HBASE-6091
 URL: https://issues.apache.org/jira/browse/HBASE-6091
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: David S. Wang
Assignee: David S. Wang




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Attachment: HBASE-5916_trunk_v7.patch

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282981#comment-13282981
 ] 

rajeshbabu commented on HBASE-5916:
---

@Zhihong Yu
Thanks for help.

In latest patch addressed Zhihong Yu comments. 

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Open  (was: Patch Available)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Patch Available  (was: Open)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987

2012-05-24 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6032:
--

Attachment: 6032-ports-5987.txt

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6074) TestHLog is flaky

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282989#comment-13282989
 ] 

Zhihong Yu commented on HBASE-6074:
---

I experienced similar issue when I worked on HBASE-5699.

I think separating some tests out into their own class(es) is one solution.

 TestHLog is flaky
 -

 Key: HBASE-6074
 URL: https://issues.apache.org/jira/browse/HBASE-6074
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.92.0
Reporter: Devaraj Das
 Attachments: 6074-1.patch


 When I run TestHLog in a loop, I see failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987

2012-05-24 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6032:
--

Status: Patch Available  (was: Open)

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5916:


Attachment: HBASE-5916v8.patch

I have make a simple patch(v8) with my above mentioned solution

@ram
Could you test it with your test case.

Maybe something wrong, thanks for the reivew.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




RE: [jira] [Updated] (HBASE-6085) SaslServer intermittently ignoring SaslClient's requests

2012-05-24 Thread Ramkrishna.S.Vasudevan
Hi Andrew

Did you intend to send this mail to me? Or to Himanshu?

Regards
Ram

 -Original Message-
 From: Andrew Purtell [mailto:apurt...@apache.org]
 Sent: Friday, May 25, 2012 12:21 AM
 To: ramkrishna.s.vasudevan (JIRA)
 Cc: issues@hbase.apache.org
 Subject: Re: [jira] [Updated] (HBASE-6085) SaslServer intermittently
 ignoring SaslClient's requests
 
 Anything of note appear in the server side logs at DEBUG level? Have
 you tried duplicating this in an all-localhost configuration? If it is
 possible to reproduce in an all-localhost configuration, or on a
 cluster not otherwise occupied, then we can turn on additional
 SASL/GSSAPI level debugging that may shed light but will be quite
 verbose.



[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987

2012-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283098#comment-13283098
 ] 

Hadoop QA commented on HBASE-6032:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12529563/6032-ports-5987.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 36 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1992//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1992//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1992//console

This message is automatically generated.

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

2012-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283105#comment-13283105
 ] 

Hadoop QA commented on HBASE-6071:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12529461/HBASE-6071.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1991//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1991//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1991//console

This message is automatically generated.

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283109#comment-13283109
 ] 

Hadoop QA commented on HBASE-5916:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12529660/HBASE-5916v8.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 34 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//console

This message is automatically generated.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283110#comment-13283110
 ] 

ramkrishna.s.vasudevan commented on HBASE-5916:
---

First of all thanks for your time in preparing a patch.
I think if we don't get the new online servers in joincluster there is one 
problem
{code}
 STEP 1: this.serverManager.expireDeadNotExpiredServers();

// Update meta with new HRI if required. i.e migrate all HRI with HTD to
// HRI with out HTD in meta and update the status in ROOT. This must happen
// before we assign all user regions or else the assignment will fail.
// TODO: Remove this when we do 0.94.
  STEP 2:org.apache.hadoop.hbase.catalog.MetaMigrationRemovingHTD.
  updateMetaWithNewHRI(this);

// Fixup assignment manager status
status.setStatus(Starting assignment manager);
  STEP 3:this.assignmentManager.joinCluster(onlineServers);
{code}
I will tell you one scenario, may be its too rare but still possible
I have 3 RS at STEP 1.
one of them goes down and the SSH processes and tries to assign the regions.
Before the assignment is done one new RS comes up before STEP 3.
There is a small chance that the regions from dead RS are assigned to this new 
RS.  Now in step 3 as we have already got the online servers list we may end up 
in thinking the new RS as an offline server after scanning META.  Pls do 
correct me.  Its a corner case.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5352) ACL improvements

2012-05-24 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283116#comment-13283116
 ] 

Laxman commented on HBASE-5352:
---

Yes Matt, there are many other apis which are not checked for authorization in 
AccessController. We may need to analyze all together once and handle them in 
phases. I will try to provide analysis of all the operations. We will discuss 
after that.

Thanks for your quick response.

 ACL improvements
 

 Key: HBASE-5352
 URL: https://issues.apache.org/jira/browse/HBASE-5352
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 In this issue I would like to open discussion for a few minor ACL related 
 improvements. The proposed changes are as follows: 
 1. Introduce something like 
 AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
 that clients can check access rights before carrying out the operations. We 
 need this kind of operation for HCATALOG-245, which introduces authorization 
 providers for hbase over hcat. We cannot use getUserPermissions() since it 
 requires ADMIN permissions on the global/table level.
 2. getUserPermissions(tableName)/grant/revoke and drop/modify table 
 operations should not check for global CREATE/ADMIN rights, but table 
 CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read 
 from a table, she should be able to read the table's permissions. We can 
 choose whether we want only READ or ADMIN permissions for 
 getUserPermission(). Since we check for global permissions first for table 
 permissions, configuring table access using global permissions will continue 
 to work.  
 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness)
 From all 3, we may want to backport the first one to 0.92 since without it, 
 Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. 
 I will create subissues and convert HBASE-5342 to a subtask when we get some 
 feedback, and opinions for going further. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6092) Authorize flush, split operations in AccessController

2012-05-24 Thread Laxman (JIRA)
Laxman created HBASE-6092:
-

 Summary: Authorize flush, split operations in AccessController
 Key: HBASE-6092
 URL: https://issues.apache.org/jira/browse/HBASE-6092
 Project: HBase
  Issue Type: Sub-task
  Components: security
Reporter: Laxman
Assignee: Laxman


Currently, some operations like flush and split are not checked for 
authorization in AccessController. With the current implementation any 
unauthorized client can trigger these operations on a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-05-24 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-4676:
-

Assignee: Matt Corgan

 Prefix Compression - Trie data block encoding
 -

 Key: HBASE-4676
 URL: https://issues.apache.org/jira/browse/HBASE-4676
 Project: HBase
  Issue Type: New Feature
  Components: io, performance, regionserver
Affects Versions: 0.90.6
Reporter: Matt Corgan
Assignee: Matt Corgan
 Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, 
 PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, 
 hbase-prefix-trie-0.1.jar


 The HBase data block format has room for 2 significant improvements for 
 applications that have high block cache hit ratios.  
 First, there is no prefix compression, and the current KeyValue format is 
 somewhat metadata heavy, so there can be tremendous memory bloat for many 
 common data layouts, specifically those with long keys and short values.
 Second, there is no random access to KeyValues inside data blocks.  This 
 means that every time you double the datablock size, average seek time (or 
 average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
 size is ~10x slower for random seeks than a 4KB block size, but block sizes 
 as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
 or more may be more efficient from a disk access and block-cache perspective 
 in many big-data applications, but doing so is infeasible from a random seek 
 perspective.
 The PrefixTrie block encoding format attempts to solve both of these 
 problems.  Some features:
 * trie format for row key encoding completely eliminates duplicate row keys 
 and encodes similar row keys into a standard trie structure which also saves 
 a lot of space
 * the column family is currently stored once at the beginning of each block.  
 this could easily be modified to allow multiple family names per block
 * all qualifiers in the block are stored in their own trie format which 
 caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
  the size of this trie determines the width of the block's qualifier 
 fixed-width-int
 * the minimum timestamp is stored at the beginning of the block, and deltas 
 are calculated from that.  the maximum delta determines the width of the 
 block's timestamp fixed-width-int
 The block is structured with metadata at the beginning, then a section for 
 the row trie, then the column trie, then the timestamp deltas, and then then 
 all the values.  Most work is done in the row trie, where every leaf node 
 (corresponding to a row) contains a list of offsets/references corresponding 
 to the cells in that row.  Each cell is fixed-width to enable binary 
 searching and is represented by [1 byte operationType, X bytes qualifier 
 offset, X bytes timestamp delta offset].
 If all operation types are the same for a block, there will be zero per-cell 
 overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
 So, the compression aspect is very strong, but makes a few small sacrifices 
 on VarInt size to enable faster binary searches in trie fan-out nodes.
 A more compressed but slower version might build on this by also applying 
 further (suffix, etc) compression on the trie nodes at the cost of slower 
 write speed.  Even further compression could be obtained by using all VInts 
 instead of FInts with a sacrifice on random seek speed (though not huge).
 One current drawback is the current write speed.  While programmed with good 
 constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
 programmed with the same level of optimization as the read path.  Work will 
 need to be done to optimize the data structures used for encoding and could 
 probably show a 10x increase.  It will still be slower than delta encoding, 
 but with a much higher decode speed.  I have not yet created a thorough 
 benchmark for write speed nor sequential read speed.
 Though the trie is reaching a point where it is internally very efficient 
 (probably within half or a quarter of its max read speed) the way that hbase 
 currently uses it is far from optimal.  The KeyValueScanner and related 
 classes that iterate through the trie will eventually need to be smarter and 
 have methods to do things like skipping to the next row of results without 
 scanning every cell in between.  When that is accomplished it will also allow 
 much faster compactions because the full row key will not have to be compared 
 as often as it is now.
 Current code is on github.  The trie code is in a separate project than the 
 slightly modified hbase.  There is an hbase project there as well with the 
 DeltaEncoding patch applied, and it builds on top of that.
 

[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283121#comment-13283121
 ] 

ramkrishna.s.vasudevan commented on HBASE-6070:
---

@Ted
TestServerCustomProtocol.testSingleMethod() passes with the patch.  I saw that 
even in someother precommit build the same has failed.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, 
 HBASE-6070_trunk_1.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987

2012-05-24 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-6032:
-

Assignee: Zhihong Yu

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987

2012-05-24 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6032:
--

Fix Version/s: 0.96.0

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283122#comment-13283122
 ] 

Zhihong Yu commented on HBASE-6032:
---

Can someone review the port please ?

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

2012-05-24 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283123#comment-13283123
 ] 

Zhihong Yu commented on HBASE-6070:
---

All right.

 AM.nodeDeleted and SSH races creating problems for regions under SPLIT
 --

 Key: HBASE-6070
 URL: https://issues.apache.org/jira/browse/HBASE-6070
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, 
 HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, 
 HBASE-6070_trunk_1.patch


 We tried to address the problems in Master restart and RS restart while SPLIT 
 region is in progress as part of HBASE-5806.
 While doing some more we found still there is one race condition.
 - Split has just started and the znode is in RS_SPLIT state.
 - RS goes down.
 - First call back for SSH comes.
 - As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
 - But now nodeDeleted event comes for the SPLIt node and there we try to 
 delete the RIT.
 - After this we try to see in the SSH whether any node is in RIT.  As we 
 dont find the region in RIT the region is never assigned.
 When we fixed HBASE-5806 step 6 happened first and then step 5 happened.  So 
 we missed it.  Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5352) ACL improvements

2012-05-24 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283124#comment-13283124
 ] 

Andrew Purtell commented on HBASE-5352:
---

Originally only the superuser could take such actions, so the AccessController 
did not need to deal with them. Now that the implementation is changing all of 
these cases need review. I suggest sub issues for each RPC interface. 

 ACL improvements
 

 Key: HBASE-5352
 URL: https://issues.apache.org/jira/browse/HBASE-5352
 Project: HBase
  Issue Type: Improvement
  Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 In this issue I would like to open discussion for a few minor ACL related 
 improvements. The proposed changes are as follows: 
 1. Introduce something like 
 AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so 
 that clients can check access rights before carrying out the operations. We 
 need this kind of operation for HCATALOG-245, which introduces authorization 
 providers for hbase over hcat. We cannot use getUserPermissions() since it 
 requires ADMIN permissions on the global/table level.
 2. getUserPermissions(tableName)/grant/revoke and drop/modify table 
 operations should not check for global CREATE/ADMIN rights, but table 
 CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read 
 from a table, she should be able to read the table's permissions. We can 
 choose whether we want only READ or ADMIN permissions for 
 getUserPermission(). Since we check for global permissions first for table 
 permissions, configuring table access using global permissions will continue 
 to work.  
 3. Grant/Revoke global permissions - HBASE-5342 (included for completeness)
 From all 3, we may want to backport the first one to 0.92 since without it, 
 Hive/Hcatalog cannot use Hbase's authorization mechanism effectively. 
 I will create subissues and convert HBASE-5342 to a subtask when we get some 
 feedback, and opinions for going further. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-6086) Admin operations on a table should be authorized against table permissions instead of global permissions.

2012-05-24 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-6086.
---

Resolution: Duplicate

 Admin operations on a table should be authorized against table permissions 
 instead of global permissions.
 -

 Key: HBASE-6086
 URL: https://issues.apache.org/jira/browse/HBASE-6086
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.94.0
Reporter: Laxman
Assignee: Laxman
  Labels: acl, security

 Still some inconsistency exists after HBASE-6061. We actually need to 
 authorize against table permissions instead of global permissions here.
 {code}
 +  private void requireTableAdminPermission(MasterCoprocessorEnvironment e,
 +  byte[] tableName) throws IOException {
 +if (isActiveUserTableOwner(e, tableName)) {
 +  requirePermission(Permission.Action.CREATE);
 +} else {
 +  requirePermission(Permission.Action.ADMIN);
 +}
 +  }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283130#comment-13283130
 ] 

chunhui shen commented on HBASE-5916:
-

@ram
The above you mentioned is a good case.

However, I find the current master logic when startup is more and more 
complicated. 
What about do the following in the process of SSH:
{code}
...
if (isCarryingRoot()){}
if (isCarryingMeta()) {}
 if (isCarryingRoot() || isCarryingMeta()) {}
int waitedTimeForMasterInitialized = 0;
while (!server.isStopped()  !services.isInitialized()) {
  try {
if (waitedTimeForMasterInitialized == 0) {
  LOG.info(Master is not initialized, waiting...);
}
Thread.sleep(100);
waitedTimeForMasterInitialized += 100;
  } catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException(Interrupted, e);
  }
}
if (waitedTimeForMasterInitialized  0) {
  LOG.info(Recovery time calculation: waiting on master to be initialized 
took 
  + waitedTimeForMasterInitialized + ms);
}

{code}

I think we could make SSH wait until master initialized after it assigned META 
region, thus we could skip considering many troublesome concurrent case .

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5993) Add a no-read Append

2012-05-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283144#comment-13283144
 ] 

Lars Hofhansl commented on HBASE-5993:
--

Then I do not understand what we are proposing here. An Append that does not 
read the existing value is a Put, no?

Maybe a patch will make it clear to me.

 Add a no-read Append
 

 Key: HBASE-5993
 URL: https://issues.apache.org/jira/browse/HBASE-5993
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Jacques
Priority: Critical

 HBASE-4102 added an atomic append.  For high performance situations, it would 
 be helpful to be able to do appends that don't actually require a read of the 
 existing value.  This would be useful in building a growing set of values.  
 Our original use case was for implementing a form of search in HBase where a 
 cell would contain a list of document ids associated with a particular 
 keyword for search.  However it seems like it would also be useful to provide 
 substantial performance improvements for most Append scenarios.
 Within the client API, the simplest way to implement this would be to 
 leverage the existing Append api.  If the Append is marked as 
 setReturnResults(false), use this code path.  If result return is requested, 
 use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira