[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-07-02 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405476#comment-13405476
 ] 

Jieshan Bean commented on HBASE-6289:
-

Can we enhance the verification of CatalogTracker#verifyRegionLocation? 
Currently, we only check if the region is present in this regionserver or not.
We can also check whether this server is in master's online list(Need to 
introduce a new method in HMasterInterface?)

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289-v2.patch, HBASE-6289-v2.patch, 
 HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-07-02 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405500#comment-13405500
 ] 

Maryann Xue commented on HBASE-6289:


@Jieshan, doable i think. but currently CatalogTracker acts more of an hbase 
client role, and talks to zookeeper and region servers only. don't know if this 
is its desired semantics.

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289-v2.patch, HBASE-6289-v2.patch, 
 HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-07-02 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405509#comment-13405509
 ] 

Jieshan Bean commented on HBASE-6289:
-

@Maryann, yes. Though we can take advantage of CatalogTracker's client role, 
and ask for this information from Master(HConnection can help achieve this).
@stack, what do you think about this?:)

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289-v2.patch, HBASE-6289-v2.patch, 
 HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404175#comment-13404175
 ] 

stack commented on HBASE-6289:
--

The verify of root was added by:

{code}
Revision 1127158 - (view) (download) (annotate) - [select for diffs] 
Modified Tue May 24 17:25:42 2011 UTC (13 months ago) by stack 
Original Path: 
hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
 
File length: 13616 byte(s) 
Diff to previous 1097275 (colored)
HBASE-3914 ROOT region appeared in two regionserver's onlineRegions at the same 
time
{code}

It was added by Jieshan to narrow window where SSH and new master startup race 
each other.

Meta was never 'verified'.  SSH just went ahead and assign .META. if we are 
processing the server that was hosting .META.  since SSH was created.

So, I'd say, if we are to preserve Jieshan's fix, we need Maryann's patch as 
is?  I'm +1 on commit.

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-29 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404280#comment-13404280
 ] 

Zhihong Ted Yu commented on HBASE-6289:
---

The reason Hadoop QA didn't post back is the following:
{code}
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java:[346,38]
 unreported exception java.lang.InterruptedException; must be caught or 
declared to be thrown
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java:[380,34]
 unreported exception java.lang.InterruptedException; must be caught or 
declared to be thrown
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java:[419,6]
 exception java.lang.InterruptedException is never thrown in body of 
corresponding try statement
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java:[693,35]
 unreported exception java.lang.InterruptedException; must be caught or 
declared to be thrown
{code}

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-29 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404365#comment-13404365
 ] 

Maryann Xue commented on HBASE-6289:


@stack thanks for the explanation!
@Ted sorry for my carelessness.

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289-v2.patch, HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404375#comment-13404375
 ] 

Hadoop QA commented on HBASE-6289:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12534084/HBASE-6289-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 6 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2298//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2298//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2298//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2298//console

This message is automatically generated.

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289-v2.patch, HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403085#comment-13403085
 ] 

ramkrishna.s.vasudevan commented on HBASE-6289:
---

@Maryann
Good one Maraynn. Nice catch.
I have one suggestion.  Can we just say assignRoot instead of verifying the 
root location like how we do assignMeta.

{code}
this.services.getAssignmentManager().assignMeta();
{code}
similarly can we say
{code}
if (isCarryingRoot()) { // -ROOT-
LOG.info(Server  + serverName +
 was carrying ROOT. Trying to assign.);
this.services.getAssignmentManager().
  regionOffline(HRegionInfo.ROOT_REGIONINFO);
this.services.getAssignmentManager().assignRoot();
  }
{code}
Because we are sure that the root is down here.  What do you feel?

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403105#comment-13403105
 ] 

Maryann Xue commented on HBASE-6289:


@ramkrishna: Yes, i thought of this too. but i this comment before 
verifyAndAssignRoot(): Before assign the ROOT region, ensure it haven't been 
assigned by other place. Not sure if this ROOT assigned elsewhere situation 
will actually possibly occur, but we seem to have seen META assigned on several 
Region Servers at the same time when there was chaos going on in our lab's 
network. There can be only one single search path for any region (incl. meta 
and root), though, regardless of client cache. And this is the thing i don't 
understand, why we try to treat ROOT differently?


 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403106#comment-13403106
 ] 

Maryann Xue commented on HBASE-6289:


@ramkrishna: Yes, i thought of this too. but i saw this comment here before 
verifyAndAssignRoot(): Before assign the ROOT region, ensure it haven't been 
assigned by other place. Not sure if this ROOT assigned elsewhere situation 
will actually possibly occur, but we seem to have seen META assigned on several 
Region Servers at the same time when there was chaos going on in our lab's 
network. There can be only one single search path for any region (incl. meta 
and root), though, regardless of client cache. And this is the thing i don't 
understand, why we try to treat ROOT differently?

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403247#comment-13403247
 ] 

stack commented on HBASE-6289:
--

@Maryann Patch looks good.  A SSH should not allow the server its handling as a 
legit .META. or -ROOT- location so your exclude makes sense.  You need curly 
braces here or put the return on same line as the if... to be within our coding 
convention.

{code}
+if (exclude != null  exclude.equals(server))
+  return null;
{code}

We can fix this on commit though.

What will happen here if server returned is same as excludes server?

{code}
+  AdminProtocol getRootServerConnection(long timeout, ServerName exclude)
   throws InterruptedException, NotAllMetaRegionsOnlineException, IOException {
-return getCachedConnection(waitForRoot(timeout));
+ServerName server = waitForRoot(timeout);
+if (exclude != null  exclude.equals(server))
+  return null;
+
+return getCachedConnection(server);
   }
{code}

We return null and go around again until the RS dies?  That seems fine but 
maybe we should log this special handling?  Just a suggestion.

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403256#comment-13403256
 ] 

ramkrishna.s.vasudevan commented on HBASE-6289:
---

@Stack
Any specific reason why we are verifying and then assigning the root region 
alone?


 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403257#comment-13403257
 ] 

Zhihong Ted Yu commented on HBASE-6289:
---

Can we have a test case for this scenario ?

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403307#comment-13403307
 ] 

stack commented on HBASE-6289:
--

bq. Any specific reason why we are verifying and then assigning the root region 
alone?

Are you asking why we assign the root, then the meta, ahead of all other 
assignments?  If so, its because these need to be assigned for sure before any 
other assignments will complete.  Maybe you were asking something else?
  
@Ted A test would be hard to get in here methinks for the startup code.  Would 
take a bunch of mocking.  We, the hbase core, should make it easier on folks 
mocking up these scenarios by building the necessary underpinnings before we 
can expect the likes of Maryann to deliver a unit test (thats my opinion).

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403613#comment-13403613
 ] 

Maryann Xue commented on HBASE-6289:


@stack Thanks for the comments! if getRootServerLocation() returns null, 
verifyRootRegionLocation() will return false, so assignRoot() can be called. 
thus, verifyAndAssignRoot() returns with success and there won't be a loop or 
retry here.
{code}
if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout, 
this.serverName)) {
  this.services.getAssignmentManager().assignRoot();
}
{code}

I think ramkrishna was asking why we only verify root before trying to assign 
it while we directly assign META? that's my question as well.

 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6289) ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is still working but only the RS's ZK node expires.

2012-06-28 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403686#comment-13403686
 ] 

ramkrishna.s.vasudevan commented on HBASE-6289:
---

@Stack
I was asking why there is a verification step while assigning ROOT and the same 
is not done while assigning META.  If the reason is known then this patch can 
be made much simpler i felt because we can just call assignRoot.  May be am 
missing something so wanted to clarify.  
Thanks.


 ROOT region doesn't get re-assigned in ServerShutdownHandler if the RS is 
 still working but only the RS's ZK node expires.
 --

 Key: HBASE-6289
 URL: https://issues.apache.org/jira/browse/HBASE-6289
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6, 0.94.0
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Critical
 Attachments: HBASE-6289.patch


 The ROOT RS has some network problem and its ZK node expires first, which 
 kicks off the ServerShutdownHandler. it calls verifyAndAssignRoot() to try to 
 re-assign ROOT. At that time, the RS is actually still working and passes the 
 verifyRootRegionLocation() check, so the ROOT region is skipped from 
 re-assignment.
 {code}
   private void verifyAndAssignRoot()
   throws InterruptedException, IOException, KeeperException {
 long timeout = this.server.getConfiguration().
   getLong(hbase.catalog.verification.timeout, 1000);
 if (!this.server.getCatalogTracker().verifyRootRegionLocation(timeout)) {
   this.services.getAssignmentManager().assignRoot();
 }
   }
 {code}
 After a few moments, this RS encounters DFS write problem and decides to 
 abort. The RS then soon gets restarted from commandline, and constantly 
 report:
 {code}
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,627 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,628 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 2012-06-27 23:13:08,630 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: 
 NotServingRegionException; Region is not online: -ROOT-,,0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira