[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403273#comment-13403273
 ] 

Hudson commented on HBASE-5875:
---

Integrated in HBase-0.94-security #38 (See 
[https://builds.apache.org/job/HBase-0.94-security/38/])
HBASE-5875 Process RIT and Master restart may remove an online server 
considering it as a dead server

Submitted by:Rajesh
Reviewed by:Ram, Ted, Stack (Revision 1353690)

 Result = FAILURE
ramkrishna : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_0.94_2.patch, HBASE-5875_trunk.patch, 
 HBASE-5875_trunk.patch, HBASE-5875_trunk_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-25 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400481#comment-13400481
 ] 

Zhihong Ted Yu commented on HBASE-5875:
---

I think so.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875_trunk.patch, 
 HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400775#comment-13400775
 ] 

Hudson commented on HBASE-5875:
---

Integrated in HBase-0.94 #280 (See 
[https://builds.apache.org/job/HBase-0.94/280/])
HBASE-5875 Process RIT and Master restart may remove an online server 
considering it as a dead server

Submitted by:Rajesh
Reviewed by:Ram, Ted, Stack (Revision 1353690)

 Result = FAILURE
ramkrishna : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_0.94_2.patch, HBASE-5875_trunk.patch, 
 HBASE-5875_trunk.patch, HBASE-5875_trunk_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400840#comment-13400840
 ] 

Hudson commented on HBASE-5875:
---

Integrated in HBase-TRUNK #3070 (See 
[https://builds.apache.org/job/HBase-TRUNK/3070/])
HBASE-5875 Process RIT and Master restart may remove an online server 
considering it as a dead server (Rajesh)

Submitted by:Rajesh 
Reviewed by:Ram Ted, Stack (Revision 1353688)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_0.94_2.patch, HBASE-5875_trunk.patch, 
 HBASE-5875_trunk.patch, HBASE-5875_trunk_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401046#comment-13401046
 ] 

Hudson commented on HBASE-5875:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #68 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/68/])
HBASE-5875 Process RIT and Master restart may remove an online server 
considering it as a dead server (Rajesh)

Submitted by:Rajesh 
Reviewed by:Ram Ted, Stack (Revision 1353688)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_0.94_2.patch, HBASE-5875_trunk.patch, 
 HBASE-5875_trunk.patch, HBASE-5875_trunk_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-24 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400169#comment-13400169
 ] 

Zhihong Ted Yu commented on HBASE-5875:
---

@Rajesh:
Once Hadoop QA runs through a patch, the attachment itself is marked.
You need to attach (the same) patch again.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-24 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400175#comment-13400175
 ] 

rajeshbabu commented on HBASE-5875:
---

@Ted,
Thanks for information.Upload the same patch and submit for Hadoop QA.


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400181#comment-13400181
 ] 

Hadoop QA commented on HBASE-5875:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12533223/HBASE-5875_trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 11 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2242//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2242//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2242//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2242//console

This message is automatically generated.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875_trunk.patch, 
 HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400284#comment-13400284
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

bq.TestMiniClusterLoadParallel,TestAtomicOperation,TestCacheOnWriteInSchema,TestCompactSelection,TestFSUtils
All these testcases are running fine in the latest precommit build.  
Is it ok Ted?

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875_trunk.patch, 
 HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-21 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398464#comment-13398464
 ] 

rajeshbabu commented on HBASE-5875:
---

Attached patch for trunk. Please review and provide comments/suggestions.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398481#comment-13398481
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

@Devs
{code}
+  // Make sure a -ROOT- location is set.
+  if (!isRootLocation()) return false;
+  // This guarantees that the transition assigning -ROOT- has completed
+  this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO);
+  assigned++;
{code}
and 
{code}
+  // Wait until META region added to region server onlineRegions. See 
HBASE-5875.
+  enableSSHandWaitForMeta();
+  assigned++;
{code}
This will ensure that we wait for ROOT and META.  Now as HBASE-5918 has gone 
in, if any RS goes down inbetween root and META assignment SSH will also be 
triggered.  
The main intention in this patch is to avoid 
{code}
splitLogAndExpireIfOnline(currentRootServer);

splitLogAndExpireIfOnline(currentMetaServer);
{code}
because the above code in case of ROOT and META in rit was removing the current 
active server thinking it as dead in case the ROOT or META is not yet online on 
RS.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-21 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398485#comment-13398485
 ] 

Zhihong Ted Yu commented on HBASE-5875:
---

@Rajesh:
Hadoop QA is not functioning.
Please report back test suite result.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-21 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398545#comment-13398545
 ] 

rajeshbabu commented on HBASE-5875:
---

@Ted,
I will run test suite locally and publish test result.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-21 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398595#comment-13398595
 ] 

Zhihong Ted Yu commented on HBASE-5875:
---

Patch looks good.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398734#comment-13398734
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

I will integrate this tomorrow if there are no objections/comments.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-06-21 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399075#comment-13399075
 ] 

rajeshbabu commented on HBASE-5875:
---

Test suite result :
{code}
Results :

Failed tests:   
testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort):
 The put should have failed, as the coprocessor is buggy
  testDrainingServerOffloading(org.apache.hadoop.hbase.TestDrainingServer): 
expected:1 but was:0
  testTaskResigned(org.apache.hadoop.hbase.master.TestSplitLogManager): 
version1=2, version=2
  
testNullReturn(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
Results should contain region 
test,bbb,1340328821040.9fe2c292d7f212976859364f8aef27a3. for row 'bbb'
  
testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation):
 expected:0 but was:3
  testPermMask(org.apache.hadoop.hbase.util.TestFSUtils): expected:rwx-- 
but was:rwxrwxrwx

Tests in error: 
  
testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): 
Failed delete of 
/mnt/F/hbaseTrunkNew/hbase-server/target/test-data/a9504511-b767-40bb-8c4b-4550baa22da2/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/7fcde0d5873845498b313524c3416091
  testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): 
Failed delete of 
/mnt/F/hbaseTrunkNew/hbase-server/target/test-data/74d5334b-a9d3-4213-b568-8315e066df68/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/9d8fa21602ce5ba40d1fa704094c8e25
  
testOffPeakCompactionRatio(org.apache.hadoop.hbase.regionserver.TestCompactSelection):
 Target HLog directory already exists: 
/mnt/F/hbaseTrunkNew/hbase-server/target/test-data/89a77fb2-2048-414c-8f94-6b9a43a51937/TestCompactSelection/logs
  
testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation):
 java.io.FileNotFoundException: 
/mnt/F/hbaseTrunkNew/hbase-server/target/classes/hbase-default.xml (Too many 
open files)
  
testCacheOnWriteInSchema[1](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema):
 Target HLog directory already exists: 
/mnt/F/hbaseTrunkNew/hbase-server/target/test-data/1480ac68-4774-454e-9127-e9bfd20864f6/TestCacheOnWriteInSchema/logs
  
testCacheOnWriteInSchema[2](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema):
 Target HLog directory already exists: 
/mnt/F/hbaseTrunkNew/hbase-server/target/test-data/1480ac68-4774-454e-9127-e9bfd20864f6/TestCacheOnWriteInSchema/logs
  loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test 
timed out after 12 milliseconds
  loadTest[1](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test 
timed out after 12 milliseconds

Tests run: 1577, Failures: 6, Errors: 8, Skipped: 9
{code}

ran failed test cases individually these test cases passes.
{code}
Running org.apache.hadoop.hbase.TestDrainingServer
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.338 sec

Running 
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.07 sec

Running org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.353 sec

Running org.apache.hadoop.hbase.master.TestSplitLogManager
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.243 sec
{code}
tests in below test cases are failing but these are not related to this issue. 
I will check these.

TestMiniClusterLoadParallel,TestAtomicOperation,TestCacheOnWriteInSchema,TestCompactSelection,TestFSUtils


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875_trunk.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. 

[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-08 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270321#comment-13270321
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

@Chunhui
Thanks for the patch.  I saw that.  Any race is possible in regionOnline() and 
processServerShutdown().  Any corner case? I just thought for the scenarios 
where two OpenedRegionHandler call comes for the same region.  I think it 
should be ok.
Are all the testcases running? Good job.
Let's see what Stack has to say for this? 

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270475#comment-13270475
 ] 

Hadoop QA commented on HBASE-5875:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525969/HBASE-5875v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1795//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1795//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1795//console

This message is automatically generated.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-08 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270480#comment-13270480
 ] 

Zhihong Yu commented on HBASE-5875:
---

Chunhui's patch looks good.
Minor comments:
{code}
+  LOG.info(-ROOT- is already onlined after process RIT);
+}else{
 if (!catalogTracker.verifyRootRegionLocation(timeout)) {
{code}
'process RIT' - 'processing RIT'
Please insert spaces around else.
Indentation for the following statements should be increased.

Similar comments apply to the handling of FIRST_META_REGIONINFO

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270701#comment-13270701
 ] 

stack commented on HBASE-5875:
--

On the patch, what Ted says.

Plus, I am not sure why we avoid verifying root and meta locations?  If they 
are online, why not do the verify?

In AM, why move the sync block?

I like that this patch is much smaller.  Much easier to reason about (smile).  
Thanks lads.

Oh, where is the test?  Is it possible to make it into a unit test and include 
it along w/ this patch?

Good stuff

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch, HBASE-5875v2.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269779#comment-13269779
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

I have addressed the third comment in the recent patch.
{code}
LOG.info(Failed verification of  + Bytes.toStringBinary(regionName) +
+   at address= + address + ;  + t);
{code}
This log is same as the one below it.  It is an existing one.
{code}
if(rit == true){
{code}
Here rit means region in transition and it applies to META also if it is in 
RIT.  So i think changing this name will not make it generic. Again, this is a 
patch for 0.94

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-07 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270137#comment-13270137
 ] 

chunhui shen commented on HBASE-5875:
-

I think we could change a litter to fix the issue.

What about checking whether region in regionsInTransitionInRS when call 
getRegionInfo for verifyReionLocation? If so, it must not in other 
regionserver, we could wait.

Another solution:
We could skip verifyReionLocation if we found it in assignment map if 
processRegionInTransitionAndBlockUntilAssigned return true, could we ?(To be 
sure, we should change a little in AssignmentManager#regionOnline)

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-07 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270181#comment-13270181
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

bq.I think we could change a litter to fix the issue.
Did you mean little? 
Can you come up with your patch based on second solution?
bq.What about checking whether region in regionsInTransitionInRS when call 
getRegionInfo for verifyReionLocation? If so, it must not in other 
regionserver, we could wait.
Here am not sure again how long to wait and how much to retry?

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch, 
 HBASE-5875_0.94_1.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268481#comment-13268481
 ] 

Hadoop QA commented on HBASE-5875:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525640/HBASE-5875_0.94.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1765//console

This message is automatically generated.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-04 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268525#comment-13268525
 ] 

Zhihong Yu commented on HBASE-5875:
---

The following change is for debugging, right ? If so, please change log level 
accordingly:
{code}
+}catch(NotServingRegionException nsre){
+  LOG.info(Failed verification of  + Bytes.toStringBinary(regionName) +
+   at address= + address + ;  + t);
+  throw nsre;
{code}
{code}
+} catch (NotServingRegionException nsre) {
+  if(rit == true){
+// the root region location is available.
{code}
People unfamiliar with processRegionInTransitionAndBlockUntilAssigned() may get 
confused by the code above. rit actually means root region has come out of 
transition. So rit should be named accordingly.
{code}
+  public void setServerShutdownHandlerEnabled(boolean 
setServerShutDownEnabled) {
{code}
The above method should be made package-private. Append 'ForTest' to the end of 
method name would help clarify its purpose.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch, HBASE-5875_0.94.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267392#comment-13267392
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

bq.What is the above referring to? Which part of the code?

In assignRootAndMeta()
{code}
boolean rit = this.assignmentManager.
  
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);

{code}


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-03 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267521#comment-13267521
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

I have reproduced the scenario addressing the title of the JIRA with a testcase.
I have tried follow a approach that Bijieshan had suggested in 
https://issues.apache.org/jira/browse/HBASE-5875?focusedCommentId=13264874page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13264874
to solve the problem. Tomorrow i can upload the testcase.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266377#comment-13266377
 ] 

chunhui shen commented on HBASE-5875:
-

bq.If the master triggered assignment has not yet been completed in RS then the 
verify root region location will fail.

Why does it happend?
In assignRootAndMeta:
{code}
boolean rit = this.assignmentManager.
  
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
{code}
We will block until master completed the assignment.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266446#comment-13266446
 ] 

chunhui shen commented on HBASE-5875:
-

@ram
I'm clear now about the time gap.

What about do the following check 
{code}
if (assignmentManager.getRegionServerOfRegion(HRegionInfo.ROOT_REGIONINFO) == 
null) {
  ServerName currentRootServer = null;
  if (!catalogTracker.verifyRootRegionLocation(timeout)) {
currentRootServer = this.catalogTracker.getRootLocation();
splitLogAndExpireIfOnline(currentRootServer);
this.assignmentManager.assignRoot();
// Make sure a -ROOT- location is set.
if (!isRootLocation())
  return false;
// This guarantees that the transition assigning -ROOT- has completed
this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO);
assigned++;
  } else {
// Region already assigned. We didn't assign it. Add to in-memory state.
this.assignmentManager.regionOnline(HRegionInfo.ROOT_REGIONINFO,
this.catalogTracker.getRootLocation());
  }
} else {
  // Root region has been assigned through processRegionInTransition
}
{code}

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266466#comment-13266466
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

@Chunhui
verifyRootRegionLocation() need not be done? What if the RS went down just 
after processing the znode to OPENED? So only SSH will come and try to assign 
root?

I am not sure if accepting that ROOT has been assigned without 
verifyRootRegionLocation() is ok? But your approach is simple.  

My idea behind the patch was without ROOT and META the cluster is non 
operative.  Hence i went with that approach. Appreciate your time. 

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266522#comment-13266522
 ] 

chunhui shen commented on HBASE-5875:
-

bq.What if the RS went down just after processing the znode to OPENED? So only 
SSH will come and try to assign root?
Yes, SSH will assign root. Also it remind me to the bug HBASE-5918, would you 
take a see?

With the current patch, I think there is possibility of data loss mentioned in 
HBASE-4880.

My approach is just a thought, since ROOT region is online in the 
AssignmentManager when initializing, it must been assigned.
However, it also has a hole where remove hregioninfo from RIT but not add the 
region to AssignmentManager.regions in AssignmentManager#regionOnline().

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266539#comment-13266539
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

@Chunhui
Yes, i took a look at HBASE-5819.
HBASE-5816 is also due to the serverShutdownHandlerEnabled variable. I think 
'serverShutdownHandlerEnabled' usage should be more clear.

I think that the problem of HBASE-4880 is applicable for the user regions but 
for root and META the updates are done by the region servers themselves and the 
problem of HBASE-4880 should not be there. Because if after updating the META 
entry to ROOT if transitioning to OPENED fails, and even if closing the region 
also fails, any way till the META is available the system is not going to 
function. Correct me if am wrong Chunhui.


 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267093#comment-13267093
 ] 

chunhui shen commented on HBASE-5875:
-

@ram
Thanks for the explaination, I think patch is OK for the issue.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267168#comment-13267168
 ] 

Zhihong Yu commented on HBASE-5875:
---

+1 on patch.

Before integrating to 0.92 branch, please run test suite.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-05-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267209#comment-13267209
 ] 

stack commented on HBASE-5875:
--

The patch looks dodgy -- saying a region is online, if it is root or meta, 
seems incorrect.

bq. Consider the case where my ROOT node is found in RIT. Hence the processRIT 
will trigger the assignment.

What is the above referring to?  Which part of the code?

bq. It so happened that when i try to verifyRootRegionLocation the root node is 
created but the OpenRegionHandler has not added the ROOT region in its 
memory(very very corner case and this happened once while testing). So the 
verifyRootRegionLocation returns false and hence the master thinks it an server 
to be expired. 

Can the master not detect this corner case just by looking at whats in zk?



 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264874#comment-13264874
 ] 

Jieshan Bean commented on HBASE-5875:
-

Look into the method of CatalogTracker#verifyRootRegionLocation:
{noformat}
public boolean verifyRootRegionLocation(final long timeout)
  throws InterruptedException, IOException {
AdminProtocol connection = null;
try {
  connection = waitForRootServerConnection(timeout);
} catch (NotAllMetaRegionsOnlineException e) {
  // Pass
} catch (ServerNotRunningYetException e) {
  // Pass -- remote server is not up so can't be carrying root
} catch (UnknownHostException e) {
  // Pass -- server name doesn't resolve so it can't be assigned anything.
}
return (connection == null)? false:
  verifyRegionLocation(connection,
this.rootRegionTracker.getRootRegionLocation(), ROOT_REGION_NAME);
  }
{noformat}
I'm thinking about an approach which can handle this issue according to 
different exception. 
e.g. if we got an ServerNotRunningYetException, we can process 
splitLogAndExpireIfOnline.
But if we got an NotServingRegionException, we should not do that.



 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264876#comment-13264876
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

@Jieshan
As Ted also suggested if we go by the exception then we need to add unnecessary 
retry logic, sleep time and also need to modify the api 
verifyRootRegionLocation which is used in many places.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264900#comment-13264900
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

Patch for trunk.  TestCases passed.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264933#comment-13264933
 ] 

Hadoop QA commented on HBASE-5875:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525060/HBASE-5875.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1689//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1689//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1689//console

This message is automatically generated.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-30 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264938#comment-13264938
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

Testcase failure seems unrelated to this fix.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1

 Attachments: HBASE-5875.patch


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-26 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262658#comment-13262658
 ] 

ramkrishna.s.vasudevan commented on HBASE-5875:
---

I would like to get some suggestions in this
{code}
boolean rit = this.assignmentManager.
  
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
ServerName currentRootServer = null;
if (!catalogTracker.verifyRootRegionLocation(timeout)) {
  currentRootServer = this.catalogTracker.getRootLocation();
{code}
Consider the case where my ROOT node is found in RIT.  Hence the processRIT 
will trigger the assignment.

It so happened that when i try to verifyRootRegionLocation the root node is 
created but the OpenRegionHandler has not added the ROOT region in its 
memory(very very corner case and this happened once while testing).  So the 
verifyRootRegionLocation returns false and hence the master thinks it an server 
to be expired.  So we just remove an normal active RS from the master memory 
thinking it as dead.  So i lose a RS itself from the master's list of online 
servers.  How can we handle this scenario?

Can we retry the verifyRootRegionLocation if it returns false and the boolean 
variable 'rit' is true?
Or can we update the root region node in the RS side after updating the online 
server list? Suggestions welcome...

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-26 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262698#comment-13262698
 ] 

Zhihong Yu commented on HBASE-5875:
---

bq. Or can we update the root region node in the RS side after updating the 
online server list?
Let's try this approach first.

The other approach would involve retry count, sleep interval, etc.

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.1


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server

2012-04-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261759#comment-13261759
 ] 

Lars Hofhansl commented on HBASE-5875:
--

Can we move this to 0.94.1?

 Process RIT and Master restart may remove an online server considering it as 
 a dead server
 --

 Key: HBASE-5875
 URL: https://issues.apache.org/jira/browse/HBASE-5875
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0


 If on master restart it finds the ROOT/META to be in RIT state, master tries 
 to assign the ROOT region through ProcessRIT.
 Master will trigger the assignment and next will try to verify the Root 
 Region Location.
 Root region location verification is done seeing if the RS has the region in 
 its online list.
 If the master triggered assignment has not yet been completed in RS then the 
 verify root region location will fail.
 Because it failed 
 {code}
 splitLogAndExpireIfOnline(currentRootServer);
 {code}
 we do split log and also remove the server from online server list. Ideally 
 here there is nothing to do in splitlog as no region server was restarted.
 So master, though the server is online, master just invalidates the region 
 server.
 In a special case, if i have only one RS then my cluster will become non 
 operative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira