[jira] [Commented] (HBASE-5787) Table owner can't disable/delete its own table

2012-04-18 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256747#comment-13256747
 ] 

Matteo Bertozzi commented on HBASE-5787:


any additional thoughts or comments on the patch?

 Table owner can't disable/delete its own table
 --

 Key: HBASE-5787
 URL: https://issues.apache.org/jira/browse/HBASE-5787
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: acl, security
 Attachments: HBASE-5787-tests-wrong-names.patch, HBASE-5787-v0.patch, 
 HBASE-5787-v1.patch


 An user with CREATE privileges can create a table, but can not disable it, 
 because disable operation require ADMIN privileges. Also if a table is 
 already disabled, anyone can remove it.
 {code}
 public void preDeleteTable(ObserverContextMasterCoprocessorEnvironment c,
 byte[] tableName) throws IOException {
   requirePermission(Permission.Action.CREATE);
 }
 public void preDisableTable(ObserverContextMasterCoprocessorEnvironment c,
 byte[] tableName) throws IOException {
   /* TODO: Allow for users with global CREATE permission and the table owner 
 */
   requirePermission(Permission.Action.ADMIN);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5787) Table owner can't disable/delete its own table

2012-04-18 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256785#comment-13256785
 ] 

Matteo Bertozzi commented on HBASE-5787:


except for a getConf() - getConfiguration() change nothing is changed in 
AccessController between 0.92 and trunk

 Table owner can't disable/delete its own table
 --

 Key: HBASE-5787
 URL: https://issues.apache.org/jira/browse/HBASE-5787
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: acl, security
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5787-tests-wrong-names.patch, HBASE-5787-v0.patch, 
 HBASE-5787-v1.patch


 An user with CREATE privileges can create a table, but can not disable it, 
 because disable operation require ADMIN privileges. Also if a table is 
 already disabled, anyone can remove it.
 {code}
 public void preDeleteTable(ObserverContextMasterCoprocessorEnvironment c,
 byte[] tableName) throws IOException {
   requirePermission(Permission.Action.CREATE);
 }
 public void preDisableTable(ObserverContextMasterCoprocessorEnvironment c,
 byte[] tableName) throws IOException {
   /* TODO: Allow for users with global CREATE permission and the table owner 
 */
   requirePermission(Permission.Action.ADMIN);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5792) HLog Performance Evaluation Tool

2012-04-16 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254654#comment-13254654
 ] 

Matteo Bertozzi commented on HBASE-5792:


@Jonathan the previous version had a HBaseAdmin call at startup because the 
tool contains also a Put test, but the hbase up si not needed.

 HLog Performance Evaluation Tool
 

 Key: HBASE-5792
 URL: https://issues.apache.org/jira/browse/HBASE-5792
 Project: HBase
  Issue Type: Test
  Components: wal
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: performance, wal
 Attachments: HBASE-5792-v0.patch, HBASE-5792-v1.patch


 Related to HDFS-3280 and the HBase WAL slowdown on 0.23+
 It would be nice to have a simple tool like HFilePerformanceEvaluation, ...
 to be able to check easily the HLog performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5792) HLog Performance Evaluation Tool

2012-04-15 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254428#comment-13254428
 ] 

Matteo Bertozzi commented on HBASE-5792:


Don't know if it makes sense catch the InterruptException, since we're trying 
to test write performance with nThreads... if we lost one thread, the test 
doesn't reflect what we've asked for.

 HLog Performance Evaluation Tool
 

 Key: HBASE-5792
 URL: https://issues.apache.org/jira/browse/HBASE-5792
 Project: HBase
  Issue Type: Test
  Components: wal
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: performance, wal
 Attachments: HBASE-5792-v0.patch, HBASE-5792-v1.patch


 Related to HDFS-3280 and the HBase WAL slowdown on 0.23+
 It would be nice to have a simple tool like HFilePerformanceEvaluation, ...
 to be able to check easily the HLog performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-10 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250868#comment-13250868
 ] 

Matteo Bertozzi commented on HBASE-5666:


The SOCKET_RETRY_WAIT_MS is 200ms but yes, is better sleeping with interrupt 
since the code can accept interrupt. The only real difference is that you've to 
wait the timeout if you want kill the inizialization.

The retry loop is tricky to understand since RecoverableZookeeper is used...
So if you give 0 as timeout, you're supposed to try once... 
but recoverableZookeeper.exists() retries in case of CONNECTIONLOSS, 
SESSIONEXPIRED and OPERATIONTIMEOUT.
The idea here is to retry for x millisec until znode become available while 
(recoverableZookeeper.exists() == null)

If the client comes up during this time I think that should crash anyway 
because the HRegion is still in the initialize() method...

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, 
 HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-10 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250907#comment-13250907
 ] 

Matteo Bertozzi commented on HBASE-5666:


{quote}
Does this need to be public? + public boolean checkIfBaseNodeAvailable(int 
timeout) {?
{quote}
ZooKeeperNodeTracker.checkIfBaseNodeAvailable(timeout) is used by the HRegion 
so needs to be public

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, 
 HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-10 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250933#comment-13250933
 ] 

Matteo Bertozzi commented on HBASE-5666:


@Stack The client crash with this message
(easy way to test put a long sleep in ZKUtil.createAndFailSilent() on /hbase 
node)
{code}
12/04/10 20:29:36 ERROR client.HConnectionManager$HConnectionImplementation: 
The node /hbase is not in ZooKeeper. It should have been written by the master. 
Check the value configured in 'zookeeper.znode.parent'. There could be a 
mismatch with the one configured in the master.
{code}

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, 
 HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-10 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251100#comment-13251100
 ] 

Matteo Bertozzi commented on HBASE-5666:


@stack
do you mean, change the log message in HConnectionManager?
{code}
Index: src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
===
--- src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
(revision 1311874)
+++ src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
(working copy)
@@ -776,6 +776,7 @@
 if (ZKUtil.checkExists(zkw, zkw.baseZNode) == -1) {
   errorMsg = The node  + zkw.baseZNode+ is not in ZooKeeper. 
 + It should have been written by the master. 
++ Check the master is running? 
 + Check the value configured in 'zookeeper.znode.parent'. 
 + There could be a mismatch with the one configured in the 
master.;
   LOG.error(errorMsg);
{code}

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, 
 HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, 
 HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, 
 hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, 
 hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-06 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248666#comment-13248666
 ] 

Matteo Bertozzi commented on HBASE-5666:


Since the logical flow is:
 * HMaster start and create the base node
 * Region Servers wait for base node to be available

I prefer to use HBASE-5666-v6.patch adding the retry logic to 
ZKUtil.checkExists() in this way we've the wait for base node step.

In the other case (@stack: fixing the ZookeeperWatcher) the first (master|RS) 
that arrive create the base node... but doesn't sounds right. 

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, 
 HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-03 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245720#comment-13245720
 ] 

Matteo Bertozzi commented on HBASE-5666:


Still looking at the 0.90 code...
The new ZooKeeperWatcher (=0.92) calls the ZKUtil.createAndFailSilent(), to 
create base node and others, only if called by HMaster (canCreateBaseZNode = 
true), while before the code path was the same for everyone.

So now, if HMaster has not reached the create base node point, before the 
HRegionServer checks the existence of base node... the region server crashes...

If we want to keep the previous logic, the first one that arrives create the 
base node  co, we can remove the canCreateBaseZNode flag, else we can use 
HBASE-5666-v4.patch to wait and retry on checkExists().

what do you think?

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-03 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245758#comment-13245758
 ] 

Matteo Bertozzi commented on HBASE-5666:


The problem that I see with the patch v4 and the if (timeout == 0) special case 
is that exists() is different for ZooKeeper and RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, 
SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this 
logic in ZKUtil.checkExist() in this way we can remove the special case, and 
remove the code in RecoverabeZK.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-03 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245759#comment-13245759
 ] 

Matteo Bertozzi commented on HBASE-5666:


The problem that I see with the patch v4 and the if (timeout == 0) special case 
is that exists() is different for ZooKeeper and RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, 
SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this 
logic in ZKUtil.checkExist() in this way we can remove the special case, and 
remove the code in RecoverabeZK.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-02 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244488#comment-13244488
 ] 

Matteo Bertozzi commented on HBASE-5666:


The problem here is that there's no ConnectionLossException... if you take a 
look at the log you can see that there's no KeeperException but zookeeper 
respond that the base node doesn't exists.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-02 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244537#comment-13244537
 ] 

Matteo Bertozzi commented on HBASE-5666:


m... maybe i've lost something but, in 0.92 and trunk that code was removed and 
there's just a call to ZKUtil.createAndFailSilent() that doesn't retry. Any 
idea?

https://github.com/apache/hbase/commit/6dc7ccf3779add13188bd73011e0d25bbab77a05

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table

2012-04-01 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243771#comment-13243771
 ] 

Matteo Bertozzi commented on HBASE-5665:


Can we also add a couple of methods to the region like isSplittable() and 
isAvailable()
{code}
boolean isAvailable() {
  return !isClosed()  !isClosing();
}

boolean isSplittable() {
  return isAvailable()  !hasReferences();
}
{code}

just to avoid similar problems in future...
For example in HRegionServer both getMostLoadedRegions() and closeUserRegions() 
does the same isAvailable() check...

 Repeated split causes HRegionServer failures and breaks table 
 --

 Key: HBASE-5665
 URL: https://issues.apache.org/jira/browse/HBASE-5665
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
Priority: Blocker
 Attachments: HBASE-5665-0.92.patch


 Repeated splits on large tables (2 consecutive would suffice) will 
 essentially break the table (and the cluster), unrecoverable.
 The regionserver doing the split dies and the master will get into an 
 infinite loop trying to assign regions that seem to have the files missing 
 from HDFS.
 The table can be disabled once. upon trying to re-enable it, it will remain 
 in an intermediary state forever.
 I was able to reproduce this on a smaller table consistently.
 {code}
 hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'}
 hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}}
 {code}
 Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) 
 will reproduce the issue almost instantly and consistently. 
 {code}
 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in 
 META
 2012-03-28 10:57:16,321 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1..  
 compaction_queue=(0:1), split_queue=10
 2012-03-28 10:57:16,343 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; 
 Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 java.io.IOException: Failed 
 ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451)
 at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
 at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
 at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284)
 at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484)
 ... 1 more
 2012-03-28 10:57:16,345 FATAL 
 

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-01 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243855#comment-13243855
 ] 

Matteo Bertozzi commented on HBASE-5666:


@Ted woo good catch
I've just translated the method without thinking...
and this simplified version emphasizes a problem already present in the 
previous version.
If you take a look at the original version, (the LOG.info is under if, ok)
but what happens if the method return and the znode is not available?
no exception is raised... but I think that the caller of waitForXyz() expect 
some exception in case of timeout, in the other case the value that I'm looking 
for must be present...
(this function is just called by one test)

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243236#comment-13243236
 ] 

Matteo Bertozzi commented on HBASE-5666:


diving into the code, I've also noticed that there's a 
ZKUtil.waitForBaseZNode() that has already the retry logic but:
 - It takes a Configuration object
 - The timeout is set internally to 1ms
 - It's only used by test/.../hbase/util/ProcessBasedLocalHBaseCluster.java


 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-31 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243302#comment-13243302
 ] 

Matteo Bertozzi commented on HBASE-5666:


I was thinking to patch ZooKeeperNodeTracker.checkIfBaseNodeAvailable() and 
HConnectionImplementation.checkIfBaseNodeAvailable() instead of only the 
HRegionServer...

what do you think?


 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log, zk-exists-refactor-v0.patch


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-30 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242734#comment-13242734
 ] 

Matteo Bertozzi commented on HBASE-5666:


The exception seems to be present in 0.92 and trunk, but I've only looked at 
trunk code

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5681) Split Region crash if region is still offline after a previous split

2012-03-30 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242736#comment-13242736
 ] 

Matteo Bertozzi commented on HBASE-5681:


quite similar, i get both exception. The FileNotFound of HBASE-5665 and 
NotServingRegionException. I'll retry with 5665 patch.

 Split Region crash if region is still offline after a previous split
 

 Key: HBASE-5681
 URL: https://issues.apache.org/jira/browse/HBASE-5681
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.96.0, 0.94.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: logs0-HBASE-5681.tar.bz2, logs1-HBASE-5681.tar.bz2


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true) due to HBASE-5666 I need a sleep to 
 ensure that rs are up.
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 sleep 5 # bug HBASE-5666 rs doesn't retry if znode is not available.
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 Once hbase is started I run an hbase shell script file (see below)
 everything is fine till last split operation.
 {code}
 # $HBASE_HOME/bin/hbase shell test.hbase
 # test.hbase
 create 'bugtb-t1', 'tcf11', 'tcf12'
 create 'bugtb-t2', 'tcf11', 'tcf12'
 put 'bugtb-t1', '10', 'tcf11:c1', 'a'
 put 'bugtb-t1', '15', 'tcf11:c2', 'b'
 put 'bugtb-t1', '20', 'tcf11:c1', 'c'
 put 'bugtb-t1', '30', 'tcf11:c2', 'd'
 put 'bugtb-t1', '35', 'tcf11:c1', 'e'
 put 'bugtb-t1', '40', 'tcf11:c2', 'f'
 put 'bugtb-t2', '10', 'tcf11:c1', 'a'
 put 'bugtb-t2', '15', 'tcf11:c2', 'b'
 put 'bugtb-t2', '20', 'tcf11:c1', 'c'
 put 'bugtb-t2', '30', 'tcf11:c2', 'd'
 put 'bugtb-t2', '35', 'tcf11:c1', 'e'
 put 'bugtb-t2', '40', 'tcf11:c2', 'f'
 split 'bugtb-t1', '20'
 split 'bugtb-t2', '20'
 split 'bugtb-t1', '40'
 {code}
 During the last split the region is still offline, and you get an 
 exception
 (If you sleep a bit before executing the last split, everything is fine)
 {code}
 ERROR: org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hbase.NotServingRegionException: Region is not online: 
 bugtb-t1,,1333134892936.4e14c2cf4293156d5b099dc3d5c44890.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3123)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2926)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1383)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-30 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242801#comment-13242801
 ] 

Matteo Bertozzi commented on HBASE-5666:


yep, just checked and the code seems unchanged from trunk to 0.92...
Anyone has a different idea on that?
or I can try to implement a retry loop?

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-03-28 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240757#comment-13240757
 ] 

Matteo Bertozzi commented on HBASE-5666:


@Zhihong Yu:
Probably a retry loop is the simplest solution...
But retry for how long? an infinite loop hoping that the node become available 
seems wrong, maybe we can add another parameter to define a ZK_RETRY_TIMEOUT.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3929) Add option to HFile tool to produce basic stats

2011-10-17 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129088#comment-13129088
 ] 

Matteo Bertozzi commented on HBASE-3929:


Currently HFilePrettyPrinter raise a couple of exceptions if the HFile is 
Empty, just because it doesn't check if seekTo() returns true or false, and the 
first call after seekTo() is a scanner.getKeyValue() so you get a NPE...

I've added a v2 patch with the pkv rename, count == 0 handled, and seekTo 
checked to fix the NPE.

 Add option to HFile tool to produce basic stats
 ---

 Key: HBASE-3929
 URL: https://issues.apache.org/jira/browse/HBASE-3929
 Project: HBase
  Issue Type: New Feature
  Components: io
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.94.0

 Attachments: HBASE-3929-v2.patch, hbase-3929-draft.patch, 
 hbase-3929-draft.txt


 In looking at HBASE-3421 I wrote a small tool to scan an HFile and produce 
 some basic statistics about it:
 - min/mean/max key size, value size (uncompressed)
 - min/mean/max number of columns per row (uncompressed)
 - min/mean/max number of bytes per row (uncompressed)
 - the key of the largest row

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3929) Add option to HFile tool to produce basic stats

2011-10-15 Thread Matteo Bertozzi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128224#comment-13128224
 ] 

Matteo Bertozzi commented on HBASE-3929:


From 0.92 the HFile.main() contains just a call to HFilePrettyPrinter.run()
So there's no more the Tool code inside the HFile.java

Probably was not the refactor that todd has in mind, but it solve the first 
todd's thought:
'we should refactor all of the HFile Tool stuff out of HFile into a new 
class.'

 Add option to HFile tool to produce basic stats
 ---

 Key: HBASE-3929
 URL: https://issues.apache.org/jira/browse/HBASE-3929
 Project: HBase
  Issue Type: New Feature
  Components: io
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.94.0

 Attachments: hbase-3929-draft.patch, hbase-3929-draft.txt


 In looking at HBASE-3421 I wrote a small tool to scan an HFile and produce 
 some basic statistics about it:
 - min/mean/max key size, value size (uncompressed)
 - min/mean/max number of columns per row (uncompressed)
 - min/mean/max number of bytes per row (uncompressed)
 - the key of the largest row

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira