date:20120524

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan reassigned HBASE-6070:
-

Assignee: ramkrishna.s.vasudevan

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Key: HBASE-6070
URL: https://issues.apache.org/jira/browse/HBASE-6070
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Fix For: 0.92.2, 0.96.0, 0.94.1

We tried to address the problems in Master restart and RS restart while SPLIT
region is in progress as part of HBASE-5806.
While doing some more we found still there is one race condition.
- Split has just started and the znode is in RS_SPLIT state.
- RS goes down.
- First call back for SSH comes.
- As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
- But now nodeDeleted event comes for the SPLIt node and there we try to
delete the RIT.
- After this we try to see in the SSH whether any node is in RIT. As we
dont find the region in RIT the region is never assigned.
When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So
we missed it. Now we found that. Will come up with a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.94.patch

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.92.patch

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_trunk.patch

Uploaded patches for all branches. Tested in cluster including scenarios for
HBASE-5806. Pls review and provide your comments.

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch,
HBASE-6070_trunk.patch

[jira] [Commented] (HBASE-5352) ACL improvements

2012-05-24 Thread Laxman (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282382#comment-13282382
]

Laxman commented on HBASE-5352:
---

Enis Matt, hope you don't mind if I add some sub-tasks related to ACL here.
Already added HBASE-6086. Matt clarified this is a duplicate of HBASE-5372.

Also one more observation I wanted to validate with you.

Currently, AccessController doesn't provide implementation for some methods
like preFlush, preSplit and many others. That means, any unauthorized user can
trigger these operations on a table.

Do we need to handle this in a separate jira?

ACL improvements

Key: HBASE-5352
URL: https://issues.apache.org/jira/browse/HBASE-5352
Project: HBase
Issue Type: Improvement
Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar

In this issue I would like to open discussion for a few minor ACL related
improvements. The proposed changes are as follows:
1. Introduce something like
AccessControllerProtocol.checkPermissions(Permission[] permissions) API, so
that clients can check access rights before carrying out the operations. We
need this kind of operation for HCATALOG-245, which introduces authorization
providers for hbase over hcat. We cannot use getUserPermissions() since it
requires ADMIN permissions on the global/table level.
2. getUserPermissions(tableName)/grant/revoke and drop/modify table
operations should not check for global CREATE/ADMIN rights, but table
CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read
from a table, she should be able to read the table's permissions. We can
choose whether we want only READ or ADMIN permissions for
getUserPermission(). Since we check for global permissions first for table
permissions, configuring table access using global permissions will continue
to work.
3. Grant/Revoke global permissions - HBASE-5342 (included for completeness)
From all 3, we may want to backport the first one to 0.92 since without it,
Hive/Hcatalog cannot use Hbase's authorization mechanism effectively.
I will create subissues and convert HBASE-5342 to a subtask when we get some
feedback, and opinions for going further.

[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.


 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Attachment: HConnectionManager_HBASE-6071-0.90.0.patch

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.4
Reporter: Igal Shilman
Priority: Minor
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Status: Patch Available (was: Open)

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Key: HBASE-6070
URL: https://issues.apache.org/jira/browse/HBASE-6070
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Fix For: 0.92.2, 0.96.0, 0.94.1

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch,
HBASE-6070_trunk.patch

[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.


 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Affects Version/s: (was: 0.90.4)
   0.90.0

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v5.patch

Fixed issues with limits in next() call.

Improve performance of scans with some kind of filters.
---

Key: HBASE-5416
URL: https://issues.apache.org/jira/browse/HBASE-5416
Project: HBase
Issue Type: Improvement
Components: filters, performance, regionserver
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Attachments: 5416-v5.txt, 5416-v6.txt, Filtered_scans.patch,
Filtered_scans_v2.patch, Filtered_scans_v3.patch, Filtered_scans_v4.patch,
Filtered_scans_v5.patch

When the scan is performed, whole row is loaded into result list, after that
filter (if exists) is applied to detect that row is needed.
But when scan is performed on several CFs and filter checks only data from
the subset of these CFs, data from CFs, not checked by a filter is not needed
on a filter stage. Only when we decided to include current row. And in such
case we can significantly reduce amount of IO performed by a scan, by loading
only values, actually checked by a filter.
For example, we have two CFs: flags and snap. Flags is quite small (bunch of
megabytes) and is used to filter large entries from snap. Snap is very large
(10s of GB) and it is quite costly to scan it. If we needed only rows with
some flag specified, we use SingleColumnValueFilter to limit result to only
small subset of region. But current implementation is loading both CFs to
perform scan, when only small subset is needed.
Attached patch adds one routine to Filter interface to allow filter to
specify which CF is needed to it's operation. In HRegion, we separate all
scanners into two groups: needed for filter and the rest (joined). When new
row is considered, only needed data is loaded, filter applied, and only if
filter accepts the row, rest of data is loaded. At our data, this speeds up
such kind of scans 30-50 times. Also, this gives us the way to better
normalize the data into separate columns by optimizing the scans performed.

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Status: Patch Available (was: Open)

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282497#comment-13282497
]

Max Lapan commented on HBASE-5416:
--

After a long delay, I decided to return to this optimization.
We have this patch on our production system (300TB HBase data, 160 nodes)
during last two months without issues. 2-phase approach tests demonstrated much
worse performance improvement over this patch - only 2 times speedup vs near 20
times.

I extended tests, but don't feel myself experienced enougth to implement
concurrent, multithread test as suggested, sorry.

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282495#comment-13282495
]

Hadoop QA commented on HBASE-5416:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12529061/Filtered_scans_v5.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1982//console

This message is automatically generated.

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.


 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Fix Version/s: 0.90.7
   Labels: client ipc  (was: )
   Status: Patch Available  (was: Open)

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Fix For: 0.90.7

 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node

2012-05-24 Thread Gopinathan A (JIRA)

Gopinathan A created HBASE-6088:
---

 Summary:  Region splitting not happened for long time due to ZK 
exception while creating RS_ZK_SPLITTING node
 Key: HBASE-6088
 URL: https://issues.apache.org/jira/browse/HBASE-6088
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Gopinathan A
 Fix For: 0.94.1


Region splitting not happened for long time due to ZK exception while creating 
RS_ZK_SPLITTING node

{noformat}
2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 26668ms for sessionid 
0x1377a75f41d0012, closing socket connection and attempting reconnect
2012-05-24 01:45:41,464 WARN 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
ZooKeeper exception: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
{noformat}

{noformat}
2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
cleanupCurrentWriter  waiting for transactions to get synced  total 189377 
synced till here 189365
2012-05-24 01:45:48,474 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: 
Running rollback/cleanup of failed split of 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
setting SPLITTING znode on 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
java.io.IOException: Failed setting SPLITTING znode on 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
KeeperErrorCode = BadVersion for 
/hbase/unassigned/bd1079bf948c672e493432020dc0e144
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
... 5 more
2012-05-24 01:45:48,476 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: 
Successful rollback of failed split of 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
{noformat}


{noformat}
2012-05-24 01:47:28,141 ERROR 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
/hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is 
not a retry
2012-05-24 01:47:28,142 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: 
Running rollback/cleanup of failed split of 
ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
java.io.IOException: Failed create of ephemeral 
/hbase/unassigned/bd1079bf948c672e493432020dc0e144
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
{noformat}

Due to the above exception, region splitting was failing contineously more than 
5hrs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

[
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282504#comment-13282504
]

Hadoop QA commented on HBASE-6071:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12528963/HConnectionManager_HBASE-6071-0.90.0.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1983//console

This message is automatically generated.

getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

Key: HBASE-6071
URL: https://issues.apache.org/jira/browse/HBASE-6071
Project: HBase
Issue Type: Improvement
Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
Labels: client, ipc
Fix For: 0.90.7

Attachments: HConnectionManager_HBASE-6071-0.90.0.patch

HConnectionImplementation.getRegionServerWithRetries might terminate w/ an
exception different then a DoNotRetryIOException, thus silently drops
exceptions from previous attempts.
[~ted_yu] suggested
([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
adding a log message inside the catch block describing the exception type
and details.

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Status: Open (was: Patch Available)

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Attachment: (was: Filtered_scans_v5.patch)

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v5.patch

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282516#comment-13282516
]

Zhihong Yu commented on HBASE-6070:
---

{code}
+// but the RS had went down before completing the split process
then will not try to
{code}
'had went down' - 'had gone down'
{code}
+ if(response == null) return null;
{code}
Space after 'if'
{code}
+ static Result getMetaTableRowResultAsSplittedRegion(final HRegionInfo hri,
final ServerName sn)
{code}
The method should be called getMetaTableRowResultAsSplitRegion().

Should investigate the test failure in TestFromClientSide

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch,
HBASE-6070_trunk.patch

[jira] [Commented] (HBASE-6088) Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node


[ 
https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282518#comment-13282518
 ] 

ramkrishna.s.vasudevan commented on HBASE-6088:
---

While we start doing the split, there are two steps in zk node creation.
- Create the node
- Write the data RS_ZK_SPLITTING into it.
Now after both the steps are completed we make an journal entry.  
Now if writing the data fails even on rollback we are not able to clean the 
node as we don't know the current journal entry.  

  Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 

 Key: HBASE-6088
 URL: https://issues.apache.org/jira/browse/HBASE-6088
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Gopinathan A
 Fix For: 0.94.1


 Region splitting not happened for long time due to ZK exception while 
 creating RS_ZK_SPLITTING node
 {noformat}
 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session 
 timed out, have not heard from server in 26668ms for sessionid 
 0x1377a75f41d0012, closing socket connection and attempting reconnect
 2012-05-24 01:45:41,464 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
 ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 {noformat}
 {noformat}
 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
 cleanupCurrentWriter  waiting for transactions to get synced  total 189377 
 synced till here 189365
 2012-05-24 01:45:48,474 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 java.io.IOException: Failed setting SPLITTING znode on 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
   at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
 KeeperErrorCode = BadVersion for 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
   ... 5 more
 2012-05-24 01:45:48,476 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of 
 failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
 {noformat}
 {noformat}
 2012-05-24 01:47:28,141 ERROR 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is 
 not a retry
 2012-05-24 01:47:28,142 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of 
 ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
 create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
 java.io.IOException: Failed create of ephemeral 
 /hbase/unassigned/bd1079bf948c672e493432020dc0e144
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
   at

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282520#comment-13282520
]

Zhihong Yu commented on HBASE-5416:
---

@Max:
The new patch is much larger than previous version. Can you provide more
detailed description on the change ?

Thanks

Improve performance of scans with some kind of filters.
---

[jira] [Assigned] (HBASE-6068) Secure HBase cluster : Client not able to call some admin APIs

2012-05-24 Thread Matteo Bertozzi (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matteo Bertozzi reassigned HBASE-6068:
--

Assignee: Matteo Bertozzi

Secure HBase cluster : Client not able to call some admin APIs
--

Key: HBASE-6068
URL: https://issues.apache.org/jira/browse/HBASE-6068
Project: HBase
Issue Type: Bug
Components: security
Affects Versions: 0.94.0
Reporter: Anoop Sam John
Assignee: Matteo Bertozzi

In case of secure cluster, we allow the HBase clients to read the zk nodes by
providing the global read permissions to all for certain nodes. These nodes
are the master address znode, root server znode and the clusterId znode. In
ZKUtil.createACL() , we can see these node names are specially handled.
But there are some other client side admin APIs which makes a read call into
the zookeeper from the client. This include the isTableEnabled() call (May be
some other. I have seen this). Here the client directly reads a node in the
zookeeper ( node created for this table ) and the data is matched to know
whether this is enabled or not.
Now in secure cluster case any client can read zookeeper nodes which it needs
for its normal operation like the master address and root server address.
But what if the client calls this API? [isTableEnaled () ].

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Attachment: Filtered_scans_v5.patch

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Attachment: (was: Filtered_scans_v5.patch)

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Max Lapan updated HBASE-5416:
-

Status: Patch Available (was: Open)

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282528#comment-13282528
]

Max Lapan commented on HBASE-5416:
--

Additional code handled the case when InternalScanner::next called with limit
!= -1. In this case, we must remember KeyValueHeap we populated when limit
reached, and restart this population on next method issue.

I also added a test case for such situation.

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Status: Open (was: Patch Available)

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_trunk.patch

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.92_1.patch

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_trunk.patch

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_0.94_1.patch

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch,
HBASE-6070_trunk_1.patch

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Status: Patch Available (was: Open)

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch,
HBASE-6070_trunk_1.patch

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_trunk_1.patch

Updated patches fixing the comments. I tried running the failed testcase. It
passed every time.

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch,
HBASE-6070_trunk_1.patch

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282534#comment-13282534
]

Zhihong Yu commented on HBASE-5416:
---

Will go over the patch when I get into office.

It would be nice to use https://reviews.apache.org to facilitate reviews.

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282539#comment-13282539
]

Max Lapan commented on HBASE-5416:
--

I tried to post it there, but constantly get Internal server error.

Improve performance of scans with some kind of filters.
---

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: (was: HBASE-6070_trunk_1.patch)

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch

[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-6070:
--

Attachment: HBASE-6070_trunk_1.patch

Just reattaching the patch.

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch,
HBASE-6070_trunk_1.patch

[jira] [Commented] (HBASE-5416) Improve performance of scans with some kind of filters.

[
https://issues.apache.org/jira/browse/HBASE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282543#comment-13282543
]

Max Lapan commented on HBASE-5416:
--

Ahhh, I'm stupid, it works with hbase-git repository. Posted
https://reviews.apache.org/r/5225/

Improve performance of scans with some kind of filters.
---

[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282580#comment-13282580
 ] 

ramkrishna.s.vasudevan commented on HBASE-5916:
---

@Chunhui
The suggestion given above can simply be avoided by taking a the actual online 
servers list after getting the logFolders.  This will ensure that we donot 
split any new RS that has checked in.

In joinCluster(), as per the existing code if any new server has checked in and 
the root/meta had got assigned to it in joincluster we may think that it is an 
dead server because we alerady have passed the online servers.  Hence we are 
trying to get the actual online list as per the patch.

The problem that you have mentioned here
bq.if Regionserver A with startcode 001 is restarted, and then Regionserver A 
with startcode 002 is in the onlineServers, but Regionserver A with startcode 
001 is in the process by SSH, not in the deadServers

This we are trying to avoid in our current v6 patch, by not remvoing from dead 
servers, any restarted server that is coming up during master initialization. 
Later after master initialization we try to clear the dead server which matches 
with the current online servers with same host name and port.

There are other problems during SSH and master initialization that may lead to 
double assignment or concurrent modification exception.  These things we will 
address in a new JIRA.
Pls review the current patch and provide your suggestions.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Attachment: HBASE-5916_trunk_v6.patch

Attached patch. Please review and provide suggestions/comments

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.

ramkrishna.s.vasudevan created HBASE-6089:
-

 Summary: SSH and AM.joinCluster causes Concurrent Modification 
exception.
 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1


AM.regions map is parallely accessed in SSH and Master initialization leading 
to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Open  (was: Patch Available)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Patch Available  (was: Open)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.


[ 
https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282582#comment-13282582
 ] 

ramkrishna.s.vasudevan commented on HBASE-6089:
---

{code}
2012-05-24 19:26:02,493 DEBUG org.apache.hadoop.hbase.master.ServerManager: New 
connection to linux146,60020,1337867810895
2012-05-24 19:26:02,552 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=2be5ef20db58b775953cc1107eb51d2d
2012-05-24 19:26:02,592 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=191b0c97f2d2a8262bf790093fdce2ab
2012-05-24 19:26:02,595 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=99d462b47ea5e301175d025204eff014
2012-05-24 19:26:03,957 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Bulk assigning done for linux146,60020,1337867810895
2012-05-24 19:26:14,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=2be5ef20db58b775953cc1107eb51d2d
2012-05-24 19:26:14,781 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1337867810895, 
region=2be5ef20db58b775953cc1107eb51d2d
2012-05-24 19:26:14,785 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for et1,,1337864575331.2be5ef20db58b775953cc1107eb51d2d. from 
linux146,60020,1337867810895; deleting unassigned node
2012-05-24 19:26:14,786 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x1377ea1a1fe002d Deleting existing unassigned node for 
2be5ef20db58b775953cc1107eb51d2d that is in expected state RS_ZK_REGION_OPENED
2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:6-0x1377ea1a1fe002d Successfully deleted unassigned node for region 
2be5ef20db58b775953cc1107eb51d2d in expected state RS_ZK_REGION_OPENED
2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1337867810895, 
region=5a84a4f4eaf2519e36a8ccc2e9c83b04
2012-05-24 19:26:14,792 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
The znode of region et1,,1337864575331.2be5ef20db58b775953cc1107eb51d2d. has 
been deleted.
2012-05-24 19:26:23,862 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
processing of shutdown of linux146,60020,1337866620614
2012-05-24 19:26:51,927 FATAL org.apache.hadoop.hbase.master.HMaster: Master 
server abort: loaded coprocessors are: []
2012-05-24 19:26:51,931 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
at java.util.TreeMap$EntryIterator.next(TreeMap.java:1136)
at java.util.TreeMap$EntryIterator.next(TreeMap.java:1131)
at 
org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:409)
at 
org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:363)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:607)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:374)
at java.lang.Thread.run(Thread.java:662)
{code}


 SSH and AM.joinCluster causes Concurrent Modification exception.
 

 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1


 AM.regions map is parallely accessed in SSH and Master initialization leading 
 to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6089) SSH and AM.joinCluster causes Concurrent Modification exception.


 [ 
https://issues.apache.org/jira/browse/HBASE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-6089:
-

Assignee: rajeshbabu

 SSH and AM.joinCluster causes Concurrent Modification exception.
 

 Key: HBASE-6089
 URL: https://issues.apache.org/jira/browse/HBASE-6089
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1


 AM.regions map is parallely accessed in SSH and Master initialization leading 
 to ConcurrentModificationException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

[
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282589#comment-13282589
]

Hadoop QA commented on HBASE-5916:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12529160/HBASE-5916_trunk_v6.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 34 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.master.TestClockSkewDetection

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1986//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1986//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1986//console

This message is automatically generated.

RS restart just before master intialization we make the cluster non operative
-

Key: HBASE-5916
URL: https://issues.apache.org/jira/browse/HBASE-5916
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
Fix For: 0.94.1

Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch,
HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch,
HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch

Consider a case where my master is getting restarted. RS that was alive when
the master restart started, gets restarted before the master initializes the
ServerShutDownHandler.
{code}
serverShutdownHandlerEnabled = true;
{code}
In this case when the RS tries to register with the master, the master will
try to expire the server but the server cannot be expired as still the
serverShutdownHandler is not enabled.
This case may happen when i have only one RS gets restarted or all the RS
gets restarted at the same time.(before assignRootandMeta).
{code}
LOG.info(message);
if (existingServer.getStartcode() serverName.getStartcode()) {
LOG.info(Triggering server recovery; existingServer +
existingServer + looks stale, new server: + serverName);
expireServer(existingServer);
}
{code}
If another RS is brought up then the cluster comes back to normalcy.
May be a very corner case.

[jira] [Updated] (HBASE-6074) TestHLog is flaky


 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: TestHLog.patch.txt

@Ted, yes, I saw the failures with hbase-0.92/hadoop-1.0.3. In my first trial, 
the org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd was failing 
intermittently. Here are the snippets from the log:

{noformat}
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)  at 
sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)  at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)  at 
java.security.AccessController.doPrivileged(Native Method)  at 
javax.security.auth.Subject.doAs(Subject.java:396)  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

Stacktrace

org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy8.complete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy8.complete(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at 
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1017)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:215)
at org.apache.hadoop.hbase.regionserver.wal.HLog.close(HLog.java:914)
at 
org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd(TestHLog.java:480)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

{noformat}

I consulted with an HDFS dev, and he thought that there might be a race 
condition with the shutting down of cluster in testAppendClose (the previous 
test in the

[jira] [Updated] (HBASE-6074) TestHLog is flaky


 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: 6074-1.patch
6074-1.patch

@Ted, yes, I saw the failures with hbase-0.92/hadoop-1.0.3. In my first trial, 
the org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd was failing 
intermittently. Here are the snippets from the log:  

{noformat} 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)  at 
sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)  at 
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)  at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)  at 
java.security.AccessController.doPrivileged(Native Method)  at 
javax.security.auth.Subject.doAs(Subject.java:396)  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

Stacktrace 

org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on 
/user/jenkins/hbase/TestHLog/hlog.1336864885313 owned by NN_Recovery but is 
accessed by DFSClient_-1644967697
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1645)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1620)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1675)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1663)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:718)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) 

at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy8.complete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy8.complete(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3894)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3809)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at 
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1017)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:215)
at org.apache.hadoop.hbase.regionserver.wal.HLog.close(HLog.java:914)
at 
org.apache.hadoop.hbase.regionserver.wal.TestHLog.testEditAdd(TestHLog.java:480)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
{noformat}

I consulted with an

[jira] [Updated] (HBASE-6074) TestHLog is flaky


 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: (was: 6074-1.patch)

 TestHLog is flaky
 -

 Key: HBASE-6074
 URL: https://issues.apache.org/jira/browse/HBASE-6074
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.92.0
Reporter: Devaraj Das
 Attachments: 6074-1.patch


 When I run TestHLog in a loop, I see failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6074) TestHLog is flaky


 [ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6074:
---

Attachment: (was: TestHLog.patch.txt)

 TestHLog is flaky
 -

 Key: HBASE-6074
 URL: https://issues.apache.org/jira/browse/HBASE-6074
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.92.0
Reporter: Devaraj Das
 Attachments: 6074-1.patch


 When I run TestHLog in a loop, I see failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Updated] (HBASE-6085) SaslServer intermittently ignoring SaslClient's requests

2012-05-24 Thread Andrew Purtell

Anything of note appear in the server side logs at DEBUG level? Have
you tried duplicating this in an all-localhost configuration? If it is
possible to reproduce in an all-localhost configuration, or on a
cluster not otherwise occupied, then we can turn on additional
SASL/GSSAPI level debugging that may shed light but will be quite
verbose.

[jira] [Created] (HBASE-6090) JMX Registration Error while booting HMaster

Elliott Clark created HBASE-6090:


 Summary: JMX Registration Error while booting HMaster
 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark


When booting master there are errors about HMaster not being a bean and being 
unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui


[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282780#comment-13282780
 ] 

Elliott Clark commented on HBASE-6084:
--

So the errors on console are pretty un-related.  so I filed HBASE-6090.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6090) JMX Registration Error while booting HMaster


 [ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark reassigned HBASE-6090:


Assignee: Elliott Clark

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui


[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282784#comment-13282784
 ] 

Gregory Chanan commented on HBASE-6084:
---

The JMX issues are tracked in HBASE-5967.  Is this a duplicate?  Or are you 
only talking about fixing toString here and the JMX issues are separate?

I like your idea about copying the format of the old one.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6090) JMX Registration Error while booting HMaster


[ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282785#comment-13282785
 ] 

Gregory Chanan commented on HBASE-6090:
---

Duplicate of HBASE-5967?

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-24 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HBASE-5892:
---

Attachment: hbase-5892.patch

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
  Labels: noob
 Attachments: hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-24 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HBASE-5892:
---

Status: Patch Available  (was: Open)

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
  Labels: noob
 Attachments: hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6090) JMX Registration Error while booting HMaster


[ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282791#comment-13282791
 ] 

Elliott Clark commented on HBASE-6090:
--

Yep this is a dupe of 5976.

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-05-24 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282790#comment-13282790
 ] 

Andrew Wang commented on HBASE-5892:


I tried to do this refactor, essentially switching out Runnable for Callable 
and adding some more logging in the process. Let me know if it's not what you 
were thinking of.

I didn't do any testing beyond running hbck on my local machine, which seemed 
to work.

 [hbck] Refactor parallel WorkItem* to Futures.
 --

 Key: HBASE-5892
 URL: https://issues.apache.org/jira/browse/HBASE-5892
 Project: HBase
  Issue Type: Improvement
Reporter: Jonathan Hsieh
  Labels: noob
 Attachments: hbase-5892.patch


 This would convert WorkItem* logic (with low level notifies, and rough 
 exception handling)  into a more canonical Futures pattern.
 Currently there are two instances of this pattern (for loading hdfs dirs, for 
 contacting regionservers for assignments, and soon -- for loading hdfs 
 .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.


 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Status: Open  (was: Patch Available)

I didn't know that patches has to be submitted against trunk first.
And also, I didn't know that diff's has to be created with --no-prefix 


 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Fix For: 0.90.7

 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6090) JMX Registration Error while booting HMaster


 [ 
https://issues.apache.org/jira/browse/HBASE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark resolved HBASE-6090.
--

Resolution: Duplicate

Dupe of HBASE-5967

 JMX Registration Error while booting HMaster
 

 Key: HBASE-6090
 URL: https://issues.apache.org/jira/browse/HBASE-6090
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark

 When booting master there are errors about HMaster not being a bean and being 
 unable to turn ServerLoad into an open class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.


 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Affects Version/s: 0.92.0
   0.94.0
Fix Version/s: (was: 0.90.7)

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0, 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui


[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282802#comment-13282802
 ] 

Elliott Clark commented on HBASE-6084:
--

This was only for the UI which required to string.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6084) Server Load does not display correctly on the ui


 [ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6084:
-

Attachment: HBASE-6084-1.patch

This patch fixes the ui and adds getters for the values that used to be there.

The totals are computed in ServerLoad from the totals of RegionLoad's

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282808#comment-13282808
 ] 

Zhihong Yu commented on HBASE-5916:
---

The failure in TestClockSkewDetection was due to NPE.
The following change makes it pass:
{code}
if ((this.services == null || ((HMaster) this.services).isInitialized())
 this.deadservers.cleanPreviousInstance(serverName)) {
{code}

{code}
+   * To clear any dead server with same host name and port of online server
{code}
I think 'any' should be added in front of 'online server'.
{code}
+  public void clearDeadServersWithSameHostNameAndPortOfOnlineServer() {
{code}
The above method can be package private, right ?
{code}
+  while ((sn = ServerName.findServerWithSameHostnamePort(this.deadservers, 
serverName)) != null) {
{code}
The above line exceeds 100 chars.
{code}
+  if(actualDeadServers.contains(deadServer.getKey())){
{code}
Add spaces after if and before {.


 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui


[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282812#comment-13282812
 ] 

Gregory Chanan commented on HBASE-6084:
---

Elliot,

I'm a bit confused.  When I look at 0.92.1's HServerLoad.toString I see:
{code}
int numberOfRegions = this.regionLoad.size();
StringBuilder sb = new StringBuilder();
sb = Strings.appendKeyValue(sb, requestsPerSecond,
  Integer.valueOf(numberOfRequests/msgInterval));
sb = Strings.appendKeyValue(sb, numberOfOnlineRegions,
  Integer.valueOf(numberOfRegions));
sb = Strings.appendKeyValue(sb, usedHeapMB,
  Integer.valueOf(this.usedHeapMB));
sb = Strings.appendKeyValue(sb, maxHeapMB, Integer.valueOf(maxHeapMB));
return sb.toString();
{code}

But your toString doesn't match.  It looks like you implemented 
HServerLoad.RegionLoad's toString in ServerLoad?

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6084) Server Load does not display correctly on the ui


[ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282814#comment-13282814
 ] 

Gregory Chanan commented on HBASE-6084:
---

I think what we need to do is the following:

1) Write a ServerLoad.toString that matches HServerLoad.toString.
2) Implement a RegionLoad (not HServerLoad.RegionLoad) that wraps the protobuf 
RegionLoad, like how ServerLoad wraps the protobuf ServerLoad
3) Write a RegionLoad.toString that matches HServerLoad.RegionLoad.toString

Does that seem correct to you or am I missing something?

You should be able to do #1 now.
I'm almost done with #2.  you can track it in HBASE-5933.
After #2, you should be able to do #3.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282818#comment-13282818
]

Zhihong Yu commented on HBASE-6070:
---

+1 on patch v2.

You may want to verify that the failed test below wasn't related to this change:
https://builds.apache.org/job/PreCommit-HBASE-Build/1987/console

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch,
HBASE-6070_trunk_1.patch

[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.


[ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282821#comment-13282821
 ] 

Zhihong Yu commented on HBASE-6071:
---

--no-prefix is not required now - Hadoop QA is smart.

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0, 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5352) ACL improvements

2012-05-24 Thread Matteo Bertozzi (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282823#comment-13282823
]

Matteo Bertozzi commented on HBASE-5352:

@Laxman yeah create a sub-task for that, ACL can be not in sync with new
features, so fill free to open a new sub-task to sync the coprocessor with the
missing stuff.

ACL improvements

[jira] [Updated] (HBASE-6084) Server Load does not display correctly on the ui


 [ 
https://issues.apache.org/jira/browse/HBASE-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6084:
-

Attachment: HBASE-6084-2.patch

I just added the extra info since we have them.  In doing that I forgot to add 
the old stuff back in.

Fixed.

 Server Load does not display correctly on the ui
 

 Key: HBASE-6084
 URL: https://issues.apache.org/jira/browse/HBASE-6084
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6084-0.patch, HBASE-6084-1.patch, 
 HBASE-6084-2.patch


 The ui uses the toString method and toString does not implement it any more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread chunhui shen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282882#comment-13282882
 ] 

chunhui shen commented on HBASE-5916:
-

@ram
bq.In joinCluster(), as per the existing code if any new server has checked in 
and the root/meta had got assigned to it in joincluster we may think that it is 
an dead server because we alerady have passed the online servers.

If we consider it as a dead server, what error will be caused? 
I think no error. Because, it must be a new regionserver (which is restarted 
right now), there is no regions carried by it. Of course, we won't assign 
region to it, but I think it is nothing.

Correct me if wrong, thanks 

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, HBASE-5916_trunk_v6.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.


 [ 
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated HBASE-6071:


Attachment: HBASE-6071.patch

 getRegionServerWithRetires, should log unsuccessful attempts and exceptions.
 

 Key: HBASE-6071
 URL: https://issues.apache.org/jira/browse/HBASE-6071
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.0, 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
  Labels: client, ipc
 Attachments: HBASE-6071.patch, 
 HConnectionManager_HBASE-6071-0.90.0.patch


 HConnectionImplementation.getRegionServerWithRetries might terminate w/ an 
 exception different then a DoNotRetryIOException, thus silently drops 
 exceptions from previous attempts.
 [~ted_yu] suggested 
 ([here|http://mail-archives.apache.org/mod_mbox/hbase-user/201205.mbox/%3CCAFebPXBq9V9BVdzRTNr-MB3a1Lz78SZj6gvP6On0b%2Bajt9StAg%40mail.gmail.com%3E])
  adding a log message inside the catch block describing the exception type 
 and details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6091) Come up with strawman proposal for RC testing matrix

2012-05-24 Thread David S. Wang (JIRA)

David S. Wang created HBASE-6091:


 Summary: Come up with strawman proposal for RC testing matrix
 Key: HBASE-6091
 URL: https://issues.apache.org/jira/browse/HBASE-6091
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: David S. Wang
Assignee: David S. Wang




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Attachment: HBASE-5916_trunk_v7.patch

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282981#comment-13282981
 ] 

rajeshbabu commented on HBASE-5916:
---

@Zhihong Yu
Thanks for help.

In latest patch addressed Zhihong Yu comments. 

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Open  (was: Patch Available)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-5916:
--

Status: Patch Available  (was: Open)

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987


 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6032:
--

Attachment: 6032-ports-5987.txt

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6074) TestHLog is flaky


[ 
https://issues.apache.org/jira/browse/HBASE-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13282989#comment-13282989
 ] 

Zhihong Yu commented on HBASE-6074:
---

I experienced similar issue when I worked on HBASE-5699.

I think separating some tests out into their own class(es) is one solution.

 TestHLog is flaky
 -

 Key: HBASE-6074
 URL: https://issues.apache.org/jira/browse/HBASE-6074
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.92.0
Reporter: Devaraj Das
 Attachments: 6074-1.patch


 When I run TestHLog in a loop, I see failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987


 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6032:
--

Status: Patch Available  (was: Open)

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

2012-05-24 Thread chunhui shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5916:


Attachment: HBASE-5916v8.patch

I have make a simple patch(v8) with my above mentioned solution

@ram
Could you test it with your test case.

Maybe something wrong, thanks for the reivew.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

RE: [jira] [Updated] (HBASE-6085) SaslServer intermittently ignoring SaslClient's requests

2012-05-24 Thread Ramkrishna.S.Vasudevan

Hi Andrew

Did you intend to send this mail to me? Or to Himanshu?

Regards
Ram

 -Original Message-
 From: Andrew Purtell [mailto:apurt...@apache.org]
 Sent: Friday, May 25, 2012 12:21 AM
 To: ramkrishna.s.vasudevan (JIRA)
 Cc: issues@hbase.apache.org
 Subject: Re: [jira] [Updated] (HBASE-6085) SaslServer intermittently
 ignoring SaslClient's requests
 
 Anything of note appear in the server side logs at DEBUG level? Have
 you tried duplicating this in an all-localhost configuration? If it is
 possible to reproduce in an all-localhost configuration, or on a
 cluster not otherwise occupied, then we can turn on additional
 SASL/GSSAPI level debugging that may shed light but will be quite
 verbose.

[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987

[
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283098#comment-13283098
]

Hadoop QA commented on HBASE-6032:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12529563/6032-ports-5987.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 15 new or modified tests.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 36 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1992//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1992//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1992//console

This message is automatically generated.

Port HFileBlockIndex improvement from HBASE-5987

Key: HBASE-6032
URL: https://issues.apache.org/jira/browse/HBASE-6032
Project: HBase
Issue Type: Task
Reporter: Zhihong Yu
Attachments: 6032-ports-5987.txt

Excerpt from HBASE-5987:
First, we propose to lookahead for one more block index so that the
HFileScanner would know the start key value of next data block. So if the
target key value for the scan(reSeekTo) is smaller than that start kv of
next data block, it means the target key value has a very high possibility in
the current data block (if not in current data block, then the start kv of
next data block should be returned. +Indexing on the start key has some
defects here+) and it shall NOT query the HFileBlockIndex in this case. On
the contrary, if the target key value is bigger, then it shall query the
HFileBlockIndex. This improvement shall help to reduce the hotness of
HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block
Cache lookup.
This JIRA is to port the fix to HBase trunk, etc.

[jira] [Commented] (HBASE-6071) getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

[
https://issues.apache.org/jira/browse/HBASE-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283105#comment-13283105
]

Hadoop QA commented on HBASE-6071:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12529461/HBASE-6071.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 33 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1991//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1991//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1991//console

This message is automatically generated.

getRegionServerWithRetires, should log unsuccessful attempts and exceptions.

Key: HBASE-6071
URL: https://issues.apache.org/jira/browse/HBASE-6071
Project: HBase
Issue Type: Improvement
Components: client, ipc
Affects Versions: 0.92.0, 0.94.0
Reporter: Igal Shilman
Priority: Minor
Labels: client, ipc
Attachments: HBASE-6071.patch,
HConnectionManager_HBASE-6071-0.90.0.patch

[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative

[
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283109#comment-13283109
]

Hadoop QA commented on HBASE-5916:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12529660/HBASE-5916v8.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 34 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//console

This message is automatically generated.

RS restart just before master intialization we make the cluster non operative
-

[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative


[ 
https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283110#comment-13283110
 ] 

ramkrishna.s.vasudevan commented on HBASE-5916:
---

First of all thanks for your time in preparing a patch.
I think if we don't get the new online servers in joincluster there is one 
problem
{code}
 STEP 1: this.serverManager.expireDeadNotExpiredServers();

// Update meta with new HRI if required. i.e migrate all HRI with HTD to
// HRI with out HTD in meta and update the status in ROOT. This must happen
// before we assign all user regions or else the assignment will fail.
// TODO: Remove this when we do 0.94.
  STEP 2:org.apache.hadoop.hbase.catalog.MetaMigrationRemovingHTD.
  updateMetaWithNewHRI(this);

// Fixup assignment manager status
status.setStatus(Starting assignment manager);
  STEP 3:this.assignmentManager.joinCluster(onlineServers);
{code}
I will tell you one scenario, may be its too rare but still possible
I have 3 RS at STEP 1.
one of them goes down and the SSH processes and tries to assign the regions.
Before the assignment is done one new RS comes up before STEP 3.
There is a small chance that the regions from dead RS are assigned to this new 
RS.  Now in step 3 as we have already got the online servers list we may end up 
in thinking the new RS as an offline server after scanning META.  Pls do 
correct me.  Its a corner case.

 RS restart just before master intialization we make the cluster non operative
 -

 Key: HBASE-5916
 URL: https://issues.apache.org/jira/browse/HBASE-5916
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.1

 Attachments: HBASE-5916_trunk.patch, HBASE-5916_trunk_1.patch, 
 HBASE-5916_trunk_1.patch, HBASE-5916_trunk_2.patch, HBASE-5916_trunk_3.patch, 
 HBASE-5916_trunk_4.patch, HBASE-5916_trunk_v5.patch, 
 HBASE-5916_trunk_v6.patch, HBASE-5916_trunk_v7.patch, HBASE-5916v8.patch


 Consider a case where my master is getting restarted.  RS that was alive when 
 the master restart started, gets restarted before the master initializes the 
 ServerShutDownHandler.
 {code}
 serverShutdownHandlerEnabled = true;
 {code}
 In this case when the RS tries to register with the master, the master will 
 try to expire the server but the server cannot be expired as still the 
 serverShutdownHandler is not enabled.
 This case may happen when i have only one RS gets restarted or all the RS 
 gets restarted at the same time.(before assignRootandMeta).
 {code}
 LOG.info(message);
   if (existingServer.getStartcode()  serverName.getStartcode()) {
 LOG.info(Triggering server recovery; existingServer  +
   existingServer +  looks stale, new server: + serverName);
 expireServer(existingServer);
   }
 {code}
 If another RS is brought up then the cluster comes back to normalcy.
 May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5352) ACL improvements

2012-05-24 Thread Laxman (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283116#comment-13283116
]

Laxman commented on HBASE-5352:
---

Yes Matt, there are many other apis which are not checked for authorization in
AccessController. We may need to analyze all together once and handle them in
phases. I will try to provide analysis of all the operations. We will discuss
after that.

Thanks for your quick response.

ACL improvements

[jira] [Created] (HBASE-6092) Authorize flush, split operations in AccessController

2012-05-24 Thread Laxman (JIRA)

Laxman created HBASE-6092:
-

 Summary: Authorize flush, split operations in AccessController
 Key: HBASE-6092
 URL: https://issues.apache.org/jira/browse/HBASE-6092
 Project: HBase
  Issue Type: Sub-task
  Components: security
Reporter: Laxman
Assignee: Laxman


Currently, some operations like flush and split are not checked for 
authorization in AccessController. With the current implementation any 
unauthorized client can trigger these operations on a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4676) Prefix Compression - Trie data block encoding

[
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Yu reassigned HBASE-4676:
-

Assignee: Matt Corgan

Prefix Compression - Trie data block encoding
-

Key: HBASE-4676
URL: https://issues.apache.org/jira/browse/HBASE-4676
Project: HBase
Issue Type: New Feature
Components: io, performance, regionserver
Affects Versions: 0.90.6
Reporter: Matt Corgan
Assignee: Matt Corgan
Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf,
PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png,
hbase-prefix-trie-0.1.jar

The HBase data block format has room for 2 significant improvements for
applications that have high block cache hit ratios.
First, there is no prefix compression, and the current KeyValue format is
somewhat metadata heavy, so there can be tremendous memory bloat for many
common data layouts, specifically those with long keys and short values.
Second, there is no random access to KeyValues inside data blocks. This
means that every time you double the datablock size, average seek time (or
average cpu consumption) goes up by a factor of 2. The standard 64KB block
size is ~10x slower for random seeks than a 4KB block size, but block sizes
as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB
or more may be more efficient from a disk access and block-cache perspective
in many big-data applications, but doing so is infeasible from a random seek
perspective.
The PrefixTrie block encoding format attempts to solve both of these
problems. Some features:
* trie format for row key encoding completely eliminates duplicate row keys
and encodes similar row keys into a standard trie structure which also saves
a lot of space
* the column family is currently stored once at the beginning of each block.
this could easily be modified to allow multiple family names per block
* all qualifiers in the block are stored in their own trie format which
caters nicely to wide rows. duplicate qualifers between rows are eliminated.
the size of this trie determines the width of the block's qualifier
fixed-width-int
* the minimum timestamp is stored at the beginning of the block, and deltas
are calculated from that. the maximum delta determines the width of the
block's timestamp fixed-width-int
The block is structured with metadata at the beginning, then a section for
the row trie, then the column trie, then the timestamp deltas, and then then
all the values. Most work is done in the row trie, where every leaf node
(corresponding to a row) contains a list of offsets/references corresponding
to the cells in that row. Each cell is fixed-width to enable binary
searching and is represented by [1 byte operationType, X bytes qualifier
offset, X bytes timestamp delta offset].
If all operation types are the same for a block, there will be zero per-cell
overhead. Same for timestamps. Same for qualifiers when i get a chance.
So, the compression aspect is very strong, but makes a few small sacrifices
on VarInt size to enable faster binary searches in trie fan-out nodes.
A more compressed but slower version might build on this by also applying
further (suffix, etc) compression on the trie nodes at the cost of slower
write speed. Even further compression could be obtained by using all VInts
instead of FInts with a sacrifice on random seek speed (though not huge).
One current drawback is the current write speed. While programmed with good
constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not
programmed with the same level of optimization as the read path. Work will
need to be done to optimize the data structures used for encoding and could
probably show a 10x increase. It will still be slower than delta encoding,
but with a much higher decode speed. I have not yet created a thorough
benchmark for write speed nor sequential read speed.
Though the trie is reaching a point where it is internally very efficient
(probably within half or a quarter of its max read speed) the way that hbase
currently uses it is far from optimal. The KeyValueScanner and related
classes that iterate through the trie will eventually need to be smarter and
have methods to do things like skipping to the next row of results without
scanning every cell in between. When that is accomplished it will also allow
much faster compactions because the full row key will not have to be compared
as often as it is now.
Current code is on github. The trie code is in a separate project than the
slightly modified hbase. There is an hbase project there as well with the
DeltaEncoding patch applied, and it builds on top of that.

[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT

[
https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283121#comment-13283121
]

ramkrishna.s.vasudevan commented on HBASE-6070:
---

@Ted
TestServerCustomProtocol.testSingleMethod() passes with the patch. I saw that
even in someother precommit build the same has failed.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/

AM.nodeDeleted and SSH races creating problems for regions under SPLIT
--

Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch,
HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch,
HBASE-6070_trunk_1.patch

[jira] [Assigned] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987


 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-6032:
-

Assignee: Zhihong Yu

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987


 [ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6032:
--

Fix Version/s: 0.96.0

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6032) Port HFileBlockIndex improvement from HBASE-5987


[ 
https://issues.apache.org/jira/browse/HBASE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283122#comment-13283122
 ] 

Zhihong Yu commented on HBASE-6032:
---

Can someone review the port please ?

 Port HFileBlockIndex improvement from HBASE-5987
 

 Key: HBASE-6032
 URL: https://issues.apache.org/jira/browse/HBASE-6032
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Fix For: 0.96.0

 Attachments: 6032-ports-5987.txt


 Excerpt from HBASE-5987:
 First, we propose to lookahead for one more block index so that the 
 HFileScanner would know the start key value of next data block. So if the 
 target key value for the scan(reSeekTo) is smaller than that start kv of 
 next data block, it means the target key value has a very high possibility in 
 the current data block (if not in current data block, then the start kv of 
 next data block should be returned. +Indexing on the start key has some 
 defects here+) and it shall NOT query the HFileBlockIndex in this case. On 
 the contrary, if the target key value is bigger, then it shall query the 
 HFileBlockIndex. This improvement shall help to reduce the hotness of 
 HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block 
 Cache lookup.
 This JIRA is to port the fix to HBase trunk, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races creating problems for regions under SPLIT