[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-20 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258171#comment-13258171
 ] 

xufeng commented on HBASE-5677:
---

@stack
Yes,we should close this issue.
I will create a new issue to backport HBASE-5454 to 0.90,0.92.2 version.
And submit the patch that the  checkinitialized method in createTable for trunk 
and 0.94 version.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2

 Attachments: 5677-proposal.txt, 5677-proposal.txt, 
 Backport-HBASE-5454-to-90.patch, Backport-HBASE-5454-to-92.patch, 
 HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, 
 surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-17 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256184#comment-13256184
 ] 

xufeng commented on HBASE-5677:
---

Pls review and if no problem,can we integrate it to 90 and 92?

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2

 Attachments: 5677-proposal.txt, 5677-proposal.txt, 
 Backport-HBASE-5454-to-90.patch, Backport-HBASE-5454-to-92.patch, 
 HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, 
 surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-13 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253196#comment-13253196
 ] 

xufeng commented on HBASE-5677:
---

@Lars
Sorry,Something I can not undestand.
I think that this issue can be fixed by HBASE-5454.
Why we need 5677-proposal.txt patch for it?



 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5677-proposal.txt, 5677-proposal.txt, 5677-proposal.txt, 
 HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, 
 surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-13 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253198#comment-13253198
 ] 

xufeng commented on HBASE-5677:
---

should we integrate the HBASE-5454 to 0.90 version?
I integrated the HBASE-5454 patch to 0.90 in my cluster,and it can work.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5677-proposal.txt, 5677-proposal.txt, 5677-proposal.txt, 
 HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, 
 surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5454) Refuse operations from Admin before master is initialized

2012-04-13 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253245#comment-13253245
 ] 

xufeng commented on HBASE-5454:
---

@chunhui
Does it need to be added in HMaster#createTable?

 Refuse operations from Admin before master is initialized
 -

 Key: HBASE-5454
 URL: https://issues.apache.org/jira/browse/HBASE-5454
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.94.0

 Attachments: hbase-5454.patch, hbase-5454v2.patch


 In our testing environment,
 When master is initializing, we found conflict problems between 
 master#assignAllUserRegions and EnableTable event, causing assigning region 
 throw exception so that master abort itself.
 We think we'd better refuse operations from Admin, such as CreateTable, 
 EnableTable,etc, It could reduce error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-13 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253961#comment-13253961
 ] 

xufeng commented on HBASE-5677:
---

@Lars
in 0.94+ this is fixed, correct? 
yes.

you like to backport HBASE-5454 to 0.90 and 0.92, right? 
ok.
But I also have a question about HBASE-5454(why did not add checkInitialized() 
in HMaster#createTable),I commented it in HBASE-5454.

Now I am at home,So no env to test it and create patch to backport in 90 and 92.
I plan to do it on Monday.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2

 Attachments: 5677-proposal.txt, 5677-proposal.txt, 
 HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, 
 surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-12 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252358#comment-13252358
 ] 

xufeng commented on HBASE-5677:
---

Test by trunk version is ok.
master do nothing if it has not initialized.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5677-proposal.txt, HBASE-5677-90-v1.patch, 
 surefire-report_no_patched_v1.html, surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-12 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252356#comment-13252356
 ] 

xufeng commented on HBASE-5677:
---

@Ted @stack @Lars
I test it use trunk version.
then I got this in shell and my test case:
${noformat}
12/04/12 19:38:35 INFO client.HBaseAdmin: Started enable of Table02
org.apache.hadoop.hbase.PleaseHoldException: 
org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
${noformat}

PleaseHoldException be added in HBASE-5454,the patch of this issue be 
integrated to trunk and 0.94 version.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5677-proposal.txt, HBASE-5677-90-v1.patch, 
 surefire-report_no_patched_v1.html, surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-12 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253024#comment-13253024
 ] 

xufeng commented on HBASE-5677:
---

@Lars
I did not change anything in trunk.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5677-proposal.txt, 5677-proposal.txt, 5677-proposal.txt, 
 HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, 
 surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-11 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251379#comment-13251379
 ] 

xufeng commented on HBASE-5677:
---

@Lars
This issue cased by client.I think that it is not similar to HBASE-5615 in 0.90 
at least.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5677-90-v1.patch, 
 surefire-report_no_patched_v1.html, surefire-report_patched_v1.html


 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-09 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249753#comment-13249753
 ] 

xufeng commented on HBASE-5677:
---

@Ted 
I test the 0.92 in my cluster by reproduce steps.
then I run the hbck tool to check the health of cluster and found many multiply 
error.
I think it also has problem in 0.92.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng

 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-09 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249754#comment-13249754
 ] 

xufeng commented on HBASE-5677:
---

I got lasted 0.92 version(revision 1311105) from 
https://svn.apache.org/repos/asf/hbase/branches/0.92
then compiled it.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng

 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-01 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243668#comment-13243668
 ] 

xufeng commented on HBASE-5677:
---

We can reproduce this issue by following steps with 0.90:

step1:start a cluster and create a table that has many regions.
step2:disable table created in step1 by shell.
step3:kill the active master.
step3:the backup master will become active one,when the master checkin 
regionservers. enable the table by shell.

result:the duplicate problem issue happened.


I think the master should not provide service when it did not complete the 
initialization.
We can add a method in HMasterInterface 
like:
{noformat}
public boolean isMasterAvailable();

  //the master is running and it can provide service
  public boolean isMasterAvailable() {
return !isStopped()  isActiveMaster()  isInitialized();
  }
{noformat}


When the client getMaster,we can check it.

pls give me the suggestions,thanks.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng

 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block

2012-03-31 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243080#comment-13243080
 ] 

xufeng commented on HBASE-5673:
---

@Stack @Ted
I analyze the problem of my patch.
this is the result:
I wrap all exception in IOException,this IOException can not be handled in 
CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn)
so the master will abort,the cases will fail.


In the future,I will submit the patch with the test result.

 The OOM problem of IPC client call  cause all handle block
 --

 Key: HBASE-5673
 URL: https://issues.apache.org/jira/browse/HBASE-5673
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90.6
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.1

 Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch


 if HBaseClient meet unable to create new native thread exception, the call 
 will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-03-30 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242118#comment-13242118
 ] 

xufeng commented on HBASE-5677:
---

If region be assigned When the master is doing initialization(before do 
processFailover),the region will be duplicate openhandled.
because the unassigned node in zookeeper will be handled again in 
AssignmentManager#processFailover()

I use the 0.90 vsersion.
I found this issue in my cluster.

1.The system did not do balance:
{noformat}
Not running balancer because 2 region(s) in transition: 
{f4ff609df50e5bc9049fe202bb90f22e=hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e.
 
state=OPEN, ts=1333036748502, 
febe5bb42ec841f7a9086d3b7bf0637c=hbase0205test,0038613802020202,1333033465474.febe5bb42ec841f7a9086d3b7bf0637c...
{noformat}

2.Choose f4ff609df50e5bc9049fe202bb90f22e as a simple to track.

3.In master log I found:
logA:
{noformat}
Line 17884: [2012-03-29 15:05:08,082] [DEBUG] 
[MASTER_OPEN_REGION-158-1-130-18:2-1] 
[org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 138] The master has 
opened the region 
hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. 
that was online on serverName=158-1-130-18,20020,1332952904731, 
load=(requests=, regions=728, usedHeap=141, maxHeap=8165)
{noformat}

logB:
{noformat}
=Line 17885: [2012-03-29 15:05:08,082] [DEBUG] [master-158-1-130-18:2] 
[org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 138] Handling 
OPENED event for 
hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. 
from serverName=158-1-130-18,20020,1332952904731, load=(requests=245, 
regions=758, usedHeap=145, maxHeap=8165); deleting unassigned node
Line 17897: [2012-03-29 15:05:08,084] [DEBUG] [master-158-1-130-18:2] 
[org.apache.hadoop.hbase.zookeeper.ZKAssign 511] master:2-0x236552a09e20353 
Deleting existing unassigned node for f4ff609df50e5bc9049fe202bb90f22e that is 
in expected state RS_ZK_REGION_OPENED
Line 17898: [2012-03-29 15:05:08,092] [WARN ] [master-158-1-130-18:2] 
[org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 123] The znode of 
the region 
hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. 
would have already been deleted
Line 17899: [2012-03-29 15:05:08,092] [ERROR] [master-158-1-130-18:2] 
[org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 97] The znode of 
region 
hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. 
could not be deleted.
{noformat}

4.The logA and logB should not appear at the same time,because belong to the 
same code in the region open flow.

5.So I ensure that this region has been handled duplicate.

6.Those log can explain what I write in Description:
Enable the table:
{noformat}
Line 16925: [2012-03-29 15:04:59,875] [DEBUG] 
[158-1-130-18:2-org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler-0]
 [org.apache.hadoop.hbase.zookeeper.ZKAssign 289] 
master:2-0x236552a09e20353 Creating (or updating) unassigned node for 
f4ff609df50e5bc9049fe202bb90f22e with OFFLINE state
{noformat}

Failover:
{noformat}
[2012-03-29 15:05:00,906] [INFO ] [master-158-1-130-18:2] 
[org.apache.hadoop.hbase.master.AssignmentManager 284] Failed-over master needs 
to process 66 regions in transition
{noformat}

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng

 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block

2012-03-30 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242175#comment-13242175
 ] 

xufeng commented on HBASE-5673:
---

Build failed!
My patch cause it happened?


 The OOM problem of IPC client call  cause all handle block
 --

 Key: HBASE-5673
 URL: https://issues.apache.org/jira/browse/HBASE-5673
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90.6
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.1

 Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch


 if HBaseClient meet unable to create new native thread exception, the call 
 will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block

2012-03-30 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242914#comment-13242914
 ] 

xufeng commented on HBASE-5673:
---

@Stack
I will check why it happened.

@Ted
How to run a single test case by maven?
I run the test in 0.94 by following commandline,
mvn clean -Dtest=TestMultiVersionstest test
but I get this reslut:
Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test 
(default-test) on project hbase: No tests were executed!  (Set 
-DfailIfNoTests=false to ignore this error.) - [Help 1]

 The OOM problem of IPC client call  cause all handle block
 --

 Key: HBASE-5673
 URL: https://issues.apache.org/jira/browse/HBASE-5673
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90.6
Reporter: xufeng
Assignee: xufeng
 Fix For: 0.90.7, 0.92.2, 0.94.1

 Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch


 if HBaseClient meet unable to create new native thread exception, the call 
 will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block

2012-03-29 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241065#comment-13241065
 ] 

xufeng commented on HBASE-5673:
---

I found this issue in my cluster.

1.I found any regionserver call not report to master because sockettimeout.
{noformat}
[2012-03-26 14:48:09,815] [INFO ] [regionserver20020] 
[org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to 
Master server at DDB03:2
[2012-03-26 14:49:09,818] [INFO ] [regionserver20020] 
[org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: 
DDB03/192.168.28.53:2
[2012-03-26 14:49:09,819] [WARN ] [regionserver20020] 
[org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to 
master. Retrying. Error was:
java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:2 failed on 
socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout 
while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 
remote=DDB03/192.168.28.53:2]
{noformat}

2.through the jstack log of master,I found that one handle is waitting and 
others is blocked(waitForMeta).
{noformat}

IPC Server handler 90 on 2 daemon prio=10 tid=0x7f219c54 
nid=0x4c3f in Object.wait() [0x7f21963a7000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
。

IPC Server handler 87 on 2 daemon prio=10 tid=0x7f219c53a000 
nid=0x4c37 waiting for monitor entry [0x7f21966aa000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397)
- waiting to lock 0x000612486960 (a 
java.util.concurrent.atomic.AtomicBoolean)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437)
。。。
{noformat}

3.I also ensure that the waitting handle cause the others blocked,the waitting 
handle is waitting for the call to complete.

4.But the unable to create new native thread” happened, the IOException can not 
caught it.
{noformat}
protected synchronized void setupIOstreams() throws IOException {

start();
  } catch (IOException e) {
markClosed(e);
close();

throw e;
  }
。
{noformat}


5.thus the call will be lost in call queue and never to complete.
{noformat}
public Writable call(..)
{
..
synchronized (call) {
  while (!call.done) {
try {
  call.wait();   // wait for the result
} catch (InterruptedException ignored) {
  // save the fact that we were interrupted
  interrupted = true;
}
  }
..
}

{noformat}

 The OOM problem of IPC client call  cause all handle block
 --

 Key: HBASE-5673
 URL: https://issues.apache.org/jira/browse/HBASE-5673
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90.6
Reporter: xufeng
Assignee: xufeng

 if HBaseClient meet unable to create new native thread exception, the call 
 will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block

2012-03-29 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241069#comment-13241069
 ] 

xufeng commented on HBASE-5673:
---

Step 4 miss some logs info:
{noformat}
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:351)
{noformat}

 The OOM problem of IPC client call  cause all handle block
 --

 Key: HBASE-5673
 URL: https://issues.apache.org/jira/browse/HBASE-5673
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90.6
Reporter: xufeng
Assignee: xufeng

 if HBaseClient meet unable to create new native thread exception, the call 
 will never complete because it be lost in calls queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-25 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238029#comment-13238029
 ] 

xufeng commented on HBASE-5615:
---

Thanks for help Ramkrishna,Jinchao and Ted.

 the master never does balance because of balancing the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
 NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html


 the master never do balance becauseof when master do rebuildUserRegions(),it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region

2012-03-23 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236464#comment-13236464
 ] 

xufeng commented on HBASE-5615:
---

reproduce this issue by 0.90
In this issue,META should hold parent region info for long time.So before 
test,I delete those code in regionserver class:
{noformat}
  public void postOpenDeployTasks(final HRegion r, final CatalogTracker ct,
  final boolean daughter)
  throws KeeperException, IOException {
// Do checks to see if we need to compact (references or too many files)
/*if (r.hasReferences() || r.hasTooManyStoreFiles()) {
  getCompactionRequester().requestCompaction(r,
r.hasReferences()? Region has references on open :
  Region has too many store files);
}*/
{noformat}

step1:start cluster that has two master and one regionerver process.
step2:create a table and input some data in it.
step3:split the table by shell.
step4:kill the active master.
step5:after backup master become active one,start another regionserver process.
result:the issue happen


I also test my patch many times and it can work.

 the master never do balance becauseof  balance the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Attachments: HBASE-5615.patch


 the master never do balance becauseof when master do rebuildUserRegions(),it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region

2012-03-22 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235426#comment-13235426
 ] 

xufeng commented on HBASE-5615:
---

In my cluster I found this issue.

1.the balance never be executed because:
{noformat}
[2012-03-21 14:11:47,226] [DEBUG] [158-1-131-48:2-BalancerChore] 
[org.apache.hadoop.hbase.master.HMaster 824] Not running balancer because 4 
region(s) in transition: 
{3139250177b9c55fbce6856e2595b272=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272.
 state=PENDING_CLOSE, ts=1332339058374, 
3d7698062c1ffaa288ffa4b0630205dd=hbaseTable,12284#51,1332214163915.3d7698062c1ffaa288ffa4b0630205dd.
 st...
{noformat}

2.choose the 3139250177b9c55fbce6856e2595b272 as a sample to track.
I found it has be splited:
{noformat}
[2012-03-20 23:40:36,496] [INFO ] [regionserver20020.compactor] 
[org.apache.hadoop.hbase.regionserver.HRegion 563] Closed 
hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272.
[2012-03-20 23:40:38,469] [INFO ] [regionserver20020.compactor] 
[org.apache.hadoop.hbase.catalog.MetaEditor 85] Offlined parent region 
hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272. in META
[2012-03-20 23:40:39,755] [INFO ] [regionserver20020.compactor] 
[org.apache.hadoop.hbase.regionserver.CompactSplitThread 181] Region split, 
META updated, and report to master. 
Parent=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272.,
 new regions: 
hbaseTable3,06640#000149,1332286834610.bf8baeae598db2a1e87dbd0a234d1539., 
hbaseTable3,06723#000707,1332286834610.64ccaffa46be50a5dbc41540006afcb6.. Split 
took 5sec
{noformat}

3.then the backup master active one, in finishInitialization() logs,I found 
those logs:
[2012-03-21 11:41:46,692] [DEBUG] [master-158-1-131-48:2] 
[org.apache.hadoop.hbase.master.handler.ServerShutdownHandler 348] Daughter 
hbaseTable3,06640#000149,1332286834610.bf8baeae598db2a1e87dbd0a234d1539. present

4.so I ensure that the parent region(3139250177b9c55fbce6856e2595b272) also in 
META table.

5.if 3139250177b9c55fbce6856e2595b272 in META, it will be added to 
AssignmentManager#regions and AssignmentManager#servers when master rebuild the 
user regions.

6.balance will reference to AssignmentManager#servers to let the 
3139250177b9c55fbce6856e2595b272 to move:
{noformat}
[2012-03-21 11:46:47,699] [INFO ] [158-1-131-48:2-BalancerChore] 
[org.apache.hadoop.hbase.master.HMaster 849] balance 
hri=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272., 
src=158-1-131-48,20020,1331918756600, dest=158-1-130-11,20020,1331918756573
{noformat}

7.the parent will in RIT forever as PENDING_CLOSE state,thus balance will never 
be executed
{noformat}
[2012-03-21 13:13:57,201] [WARN ] [PRI IPC Server handler 3 on 20020] 
[org.apache.hadoop.hbase.regionserver.HRegionServer 2211] Received close for 
region we are not serving; 3139250177b9c55fbce6856e2595b272
{noformat}

{noformat}
[2012-03-21 11:55:55,638] [INFO ] [158-1-131-48:2.timeoutMonitor] 
[org.apache.hadoop.hbase.master.AssignmentManager 2327] Regions in transition 
timed out:  
hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272. 
state=PENDING_CLOSE, ts=1332330775586
[2012-03-21 11:55:55,639] [INFO ] [158-1-131-48:2.timeoutMonitor] 
[org.apache.hadoop.hbase.master.AssignmentManager 2363] Region has been 
PENDING_CLOSE for too long, running forced unassign again on 
region=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272.
{noformat}

 the master never do balance becauseof  balance the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Reporter: xufeng
Assignee: xufeng
Priority: Critical

 the master never do balance becauseof when master do rebuildUserRegions(),it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region

2012-03-22 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235432#comment-13235432
 ] 

xufeng commented on HBASE-5615:
---

I use the 0.90
BTW:I can not compile the 0.90 branch on location by maven.is this a problem?

the error log is:
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) 
on project hbase: Compilation failure
[ERROR] 
/opt/xufeng/module/hbase/host_java/src/HBASE_ONLINE/src/main/java/org/apache/hadoop/hbase/master/HMaster.java:[1121,22]
 cannot find symbol
[ERROR] symbol  : class ServerName
[ERROR] location: class org.apache.hadoop.hbase.master.HMaster
{noformat}

 the master never do balance becauseof  balance the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Reporter: xufeng
Assignee: xufeng
Priority: Critical

 the master never do balance becauseof when master do rebuildUserRegions(),it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region

2012-03-22 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235541#comment-13235541
 ] 

xufeng commented on HBASE-5615:
---

the log of step2 from 158-1-131-48,20020,1331918756600

 the master never do balance becauseof  balance the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Attachments: HBASE-5615.patch


 the master never do balance becauseof when master do rebuildUserRegions(),it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-12 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168051#comment-13168051
 ] 

xufeng commented on HBASE-4951:
---

I tested this patch in 0.90.
It can not work in following scenarios:
1.master startup,one regionserver startup.
2.waitForRegionServers over and ok.
3.run the bin/hbase master stop before root region be assigned.

the bin/hbase master stop will stop the cluster,the regionserver will  been 
killed first.
The root region has no chance to be assigned successfully,it will block in 
catalogTracker.waitForRoot().

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.6

 Attachments: HBASE-4951.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-12 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168054#comment-13168054
 ] 

xufeng commented on HBASE-4951:
---

I think this problem is also exist in trunk by this patch.

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.6

 Attachments: HBASE-4951.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-08 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165916#comment-13165916
 ] 

xufeng commented on HBASE-4951:
---

@ramkrishna
thanks.
do you think we should fix it in 0.90.
I try to create a path. 

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0, 0.90.5


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4773) HBaseAdmin leaks ZooKeeper connections

2011-11-28 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158293#comment-13158293
 ] 

xufeng commented on HBASE-4773:
---

@Ted 
yes,I have run patch for TRUNK through unit test suite in my env.

 HBaseAdmin leaks ZooKeeper connections
 --

 Key: HBASE-4773
 URL: https://issues.apache.org/jira/browse/HBASE-4773
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Priority: Critical
 Fix For: 0.90.5

 Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch


 When master crashs, HBaseAdmin will leaks ZooKeeper connections
 I think we should close the zk connetion when throw MasterNotRunningException
  public HBaseAdmin(Configuration c)
   throws MasterNotRunningException, ZooKeeperConnectionException {
 this.conf = HBaseConfiguration.create(c);
 this.connection = HConnectionManager.getConnection(this.conf);
 this.pause = this.conf.getLong(hbase.client.pause, 1000);
 this.numRetries = this.conf.getInt(hbase.client.retries.number, 10);
 this.retryLongerMultiplier = 
 this.conf.getInt(hbase.client.retries.longer.multiplier, 10);
 //we should add this code and close the zk connection
 try{
   this.connection.getMaster();
 }catch(MasterNotRunningException e){
   HConnectionManager.deleteConnection(conf, false);
   throw e;  
 }
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4773) HBaseAdmin leaks ZooKeeper connections

2011-11-25 Thread xufeng (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157024#comment-13157024
 ] 

xufeng commented on HBASE-4773:
---

yes, I have tested it in my cluster.

Here is my client test code:
{noformat}
.
  static void initHBase() throws ZooKeeperConnectionException
  {
HBaseAdmin hbaseAdmin = null;
Configuration config = HBaseConfiguration.create();
config.set(hbase.zookeeper.quorum, 
158.1.130.31,158.1.130.32,158.1.130.33);
config.set(hbase.zookeeper.property.clientPort, 2181);

try {
  hbaseAdmin = new HBaseAdmin(config);
  System.out.println(init sucess!);
} catch (MasterNotRunningException e) {
  e.printStackTrace();
  initHBase();
  
} catch (ZooKeeperConnectionException e) {
  e.printStackTrace();
  initHBase();
}
  }
}
.
{noformat}

In my cluster I did not start HBase process.

Run test,result of the lsof commondline is:
{noformat}
java  16735   root   72w  REG  253,3   890569 
524379 /opt/xf/hadoop.log
java  16735   root   73w  REG  253,3   274338 
524376 /opt/xf/HA_hadoop.log
java  16735   root   74r FIFO0,8  0t0  
110645029 pipe
java  16735   root   75w FIFO0,8  0t0  
110645029 pipe
java  16735   root   76u 0,90 
21 anon_inode
java  16735   root   77u IPv6  110645030  0t0
TCP C3S31:35186-C3S33:eforward (ESTABLISHED)
java  16735   root   78u unix 0x8800cba90380  0t0  
110645035 socket
java  16735   root   79u sock0,6  0t0  
110645032 can't identify protocol
java  16735   root   80r FIFO0,8  0t0  
110645037 pipe
java  16735   root   81w FIFO0,8  0t0  
110645037 pipe
java  16735   root   82u 0,90 
21 anon_inode
java  16735   root   83u IPv6  110645038  0t0
TCP C3S31:53727-C3S31:eforward (ESTABLISHED)
java  16735   root   84r FIFO0,8  0t0  
110645043 pipe
java  16735   root   85w FIFO0,8  0t0  
110645043 pipe
java  16735   root   86u 0,90 
21 anon_inode
java  16735   root   87u IPv6  110645044  0t0
TCP C3S31:53728-C3S31:eforward (ESTABLISHED)
java  16735   root   88r FIFO0,8  0t0  
110645047 pipe
java  16735   root   89w FIFO0,8  0t0  
110645047 pipe
java  16735   root   90u 0,90 
21 anon_inode
java  16735   root   91u IPv6  110645048  0t0
TCP C3S31:47183-C3S32:eforward (ESTABLISHED)
java  16735   root   92r FIFO0,8  0t0  
110645050 pipe
java  16735   root   93w FIFO0,8  0t0  
110645050 pipe
java  16735   root   94u 0,90 
21 anon_inode
java  16735   root   95u IPv6  110645051  0t0
TCP C3S31:53730-C3S31:eforward (ESTABLISHED)
java  16735   root   96r FIFO0,8  0t0  
110645135 pipe
java  16735   root   97w FIFO0,8  0t0  
110645135 pipe
java  16735   root   98u 0,90 
21 anon_inode
java  16735   root   99u IPv6  110645136  0t0
TCP C3S31:49799-C3S31:eforward (ESTABLISHED)
java  16735   root  100r FIFO0,8  0t0  
110645143 pipe
java  16735   root  101w FIFO0,8  0t0  
110645143 pipe
java  16735   root  102u 0,90 
21 anon_inode
java  16735   root  103u IPv6  110645144  0t0
TCP C3S31:38931-C3S32:eforward (ESTABLISHED)
java  16735   root  104r FIFO0,8  0t0  
110645148 pipe
java  16735   root  105w FIFO0,8  0t0  
110645148 pipe
java  16735   root  106u 0,90 
21 anon_inode
java  16735   root  107u IPv6  110645149  0t0
TCP C3S31:59939-C3S33:eforward (ESTABLISHED)
java  16735   root  108r FIFO0,8  0t0  
110645507 pipe
java  16735   root  109w FIFO0,8  0t0  
110645507 pipe
java  16735   root  110u 0,90 
21 anon_inode
java  16735   root  111u IPv6  110645508  0t0
TCP C3S31:59940-C3S33:eforward (ESTABLISHED)
{noformat}

The [eforward] is port of zookeeper.