[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258171#comment-13258171 ] xufeng commented on HBASE-5677: --- @stack Yes,we should close this issue. I will create a new issue to backport HBASE-5454 to 0.90,0.92.2 version. And submit the patch that the checkinitialized method in createTable for trunk and 0.94 version. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2 Attachments: 5677-proposal.txt, 5677-proposal.txt, Backport-HBASE-5454-to-90.patch, Backport-HBASE-5454-to-92.patch, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256184#comment-13256184 ] xufeng commented on HBASE-5677: --- Pls review and if no problem,can we integrate it to 90 and 92? The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2 Attachments: 5677-proposal.txt, 5677-proposal.txt, Backport-HBASE-5454-to-90.patch, Backport-HBASE-5454-to-92.patch, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253196#comment-13253196 ] xufeng commented on HBASE-5677: --- @Lars Sorry,Something I can not undestand. I think that this issue can be fixed by HBASE-5454. Why we need 5677-proposal.txt patch for it? The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5677-proposal.txt, 5677-proposal.txt, 5677-proposal.txt, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253198#comment-13253198 ] xufeng commented on HBASE-5677: --- should we integrate the HBASE-5454 to 0.90 version? I integrated the HBASE-5454 patch to 0.90 in my cluster,and it can work. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5677-proposal.txt, 5677-proposal.txt, 5677-proposal.txt, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5454) Refuse operations from Admin before master is initialized
[ https://issues.apache.org/jira/browse/HBASE-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253245#comment-13253245 ] xufeng commented on HBASE-5454: --- @chunhui Does it need to be added in HMaster#createTable? Refuse operations from Admin before master is initialized - Key: HBASE-5454 URL: https://issues.apache.org/jira/browse/HBASE-5454 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.94.0 Attachments: hbase-5454.patch, hbase-5454v2.patch In our testing environment, When master is initializing, we found conflict problems between master#assignAllUserRegions and EnableTable event, causing assigning region throw exception so that master abort itself. We think we'd better refuse operations from Admin, such as CreateTable, EnableTable,etc, It could reduce error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253961#comment-13253961 ] xufeng commented on HBASE-5677: --- @Lars in 0.94+ this is fixed, correct? yes. you like to backport HBASE-5454 to 0.90 and 0.92, right? ok. But I also have a question about HBASE-5454(why did not add checkInitialized() in HMaster#createTable),I commented it in HBASE-5454. Now I am at home,So no env to test it and create patch to backport in 90 and 92. I plan to do it on Monday. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2 Attachments: 5677-proposal.txt, 5677-proposal.txt, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252358#comment-13252358 ] xufeng commented on HBASE-5677: --- Test by trunk version is ok. master do nothing if it has not initialized. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5677-proposal.txt, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252356#comment-13252356 ] xufeng commented on HBASE-5677: --- @Ted @stack @Lars I test it use trunk version. then I got this in shell and my test case: ${noformat} 12/04/12 19:38:35 INFO client.HBaseAdmin: Started enable of Table02 org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing ${noformat} PleaseHoldException be added in HBASE-5454,the patch of this issue be integrated to trunk and 0.94 version. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5677-proposal.txt, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253024#comment-13253024 ] xufeng commented on HBASE-5677: --- @Lars I did not change anything in trunk. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5677-proposal.txt, 5677-proposal.txt, 5677-proposal.txt, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251379#comment-13251379 ] xufeng commented on HBASE-5677: --- @Lars This issue cased by client.I think that it is not similar to HBASE-5615 in 0.90 at least. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249753#comment-13249753 ] xufeng commented on HBASE-5677: --- @Ted I test the 0.92 in my cluster by reproduce steps. then I run the hbck tool to check the health of cluster and found many multiply error. I think it also has problem in 0.92. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249754#comment-13249754 ] xufeng commented on HBASE-5677: --- I got lasted 0.92 version(revision 1311105) from https://svn.apache.org/repos/asf/hbase/branches/0.92 then compiled it. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243668#comment-13243668 ] xufeng commented on HBASE-5677: --- We can reproduce this issue by following steps with 0.90: step1:start a cluster and create a table that has many regions. step2:disable table created in step1 by shell. step3:kill the active master. step3:the backup master will become active one,when the master checkin regionservers. enable the table by shell. result:the duplicate problem issue happened. I think the master should not provide service when it did not complete the initialization. We can add a method in HMasterInterface like: {noformat} public boolean isMasterAvailable(); //the master is running and it can provide service public boolean isMasterAvailable() { return !isStopped() isActiveMaster() isInitialized(); } {noformat} When the client getMaster,we can check it. pls give me the suggestions,thanks. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block
[ https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243080#comment-13243080 ] xufeng commented on HBASE-5673: --- @Stack @Ted I analyze the problem of my patch. this is the result: I wrap all exception in IOException,this IOException can not be handled in CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn) so the master will abort,the cases will fail. In the future,I will submit the patch with the test result. The OOM problem of IPC client call cause all handle block -- Key: HBASE-5673 URL: https://issues.apache.org/jira/browse/HBASE-5673 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90.6 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.1 Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch if HBaseClient meet unable to create new native thread exception, the call will never complete because it be lost in calls queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242118#comment-13242118 ] xufeng commented on HBASE-5677: --- If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() I use the 0.90 vsersion. I found this issue in my cluster. 1.The system did not do balance: {noformat} Not running balancer because 2 region(s) in transition: {f4ff609df50e5bc9049fe202bb90f22e=hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. state=OPEN, ts=1333036748502, febe5bb42ec841f7a9086d3b7bf0637c=hbase0205test,0038613802020202,1333033465474.febe5bb42ec841f7a9086d3b7bf0637c... {noformat} 2.Choose f4ff609df50e5bc9049fe202bb90f22e as a simple to track. 3.In master log I found: logA: {noformat} Line 17884: [2012-03-29 15:05:08,082] [DEBUG] [MASTER_OPEN_REGION-158-1-130-18:2-1] [org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 138] The master has opened the region hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. that was online on serverName=158-1-130-18,20020,1332952904731, load=(requests=, regions=728, usedHeap=141, maxHeap=8165) {noformat} logB: {noformat} =Line 17885: [2012-03-29 15:05:08,082] [DEBUG] [master-158-1-130-18:2] [org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 138] Handling OPENED event for hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. from serverName=158-1-130-18,20020,1332952904731, load=(requests=245, regions=758, usedHeap=145, maxHeap=8165); deleting unassigned node Line 17897: [2012-03-29 15:05:08,084] [DEBUG] [master-158-1-130-18:2] [org.apache.hadoop.hbase.zookeeper.ZKAssign 511] master:2-0x236552a09e20353 Deleting existing unassigned node for f4ff609df50e5bc9049fe202bb90f22e that is in expected state RS_ZK_REGION_OPENED Line 17898: [2012-03-29 15:05:08,092] [WARN ] [master-158-1-130-18:2] [org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 123] The znode of the region hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. would have already been deleted Line 17899: [2012-03-29 15:05:08,092] [ERROR] [master-158-1-130-18:2] [org.apache.hadoop.hbase.master.handler.OpenedRegionHandler 97] The znode of region hbase0205test,0038613850505050,1333033465665.f4ff609df50e5bc9049fe202bb90f22e. could not be deleted. {noformat} 4.The logA and logB should not appear at the same time,because belong to the same code in the region open flow. 5.So I ensure that this region has been handled duplicate. 6.Those log can explain what I write in Description: Enable the table: {noformat} Line 16925: [2012-03-29 15:04:59,875] [DEBUG] [158-1-130-18:2-org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler-0] [org.apache.hadoop.hbase.zookeeper.ZKAssign 289] master:2-0x236552a09e20353 Creating (or updating) unassigned node for f4ff609df50e5bc9049fe202bb90f22e with OFFLINE state {noformat} Failover: {noformat} [2012-03-29 15:05:00,906] [INFO ] [master-158-1-130-18:2] [org.apache.hadoop.hbase.master.AssignmentManager 284] Failed-over master needs to process 66 regions in transition {noformat} The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block
[ https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242175#comment-13242175 ] xufeng commented on HBASE-5673: --- Build failed! My patch cause it happened? The OOM problem of IPC client call cause all handle block -- Key: HBASE-5673 URL: https://issues.apache.org/jira/browse/HBASE-5673 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90.6 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.1 Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch if HBaseClient meet unable to create new native thread exception, the call will never complete because it be lost in calls queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block
[ https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242914#comment-13242914 ] xufeng commented on HBASE-5673: --- @Stack I will check why it happened. @Ted How to run a single test case by maven? I run the test in 0.94 by following commandline, mvn clean -Dtest=TestMultiVersionstest test but I get this reslut: Results : Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (default-test) on project hbase: No tests were executed! (Set -DfailIfNoTests=false to ignore this error.) - [Help 1] The OOM problem of IPC client call cause all handle block -- Key: HBASE-5673 URL: https://issues.apache.org/jira/browse/HBASE-5673 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90.6 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.1 Attachments: HBASE-5673-90-V2.patch, HBASE-5673-90.patch if HBaseClient meet unable to create new native thread exception, the call will never complete because it be lost in calls queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block
[ https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241065#comment-13241065 ] xufeng commented on HBASE-5673: --- I found this issue in my cluster. 1.I found any regionserver call not report to master because sockettimeout. {noformat} [2012-03-26 14:48:09,815] [INFO ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to Master server at DDB03:2 [2012-03-26 14:49:09,818] [INFO ] [regionserver20020] [org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: DDB03/192.168.28.53:2 [2012-03-26 14:49:09,819] [WARN ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to master. Retrying. Error was: java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:2 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 remote=DDB03/192.168.28.53:2] {noformat} 2.through the jstack log of master,I found that one handle is waitting and others is blocked(waitForMeta). {noformat} IPC Server handler 90 on 2 daemon prio=10 tid=0x7f219c54 nid=0x4c3f in Object.wait() [0x7f21963a7000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) 。 IPC Server handler 87 on 2 daemon prio=10 tid=0x7f219c53a000 nid=0x4c37 waiting for monitor entry [0x7f21966aa000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397) - waiting to lock 0x000612486960 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437) 。。。 {noformat} 3.I also ensure that the waitting handle cause the others blocked,the waitting handle is waitting for the call to complete. 4.But the unable to create new native thread” happened, the IOException can not caught it. {noformat} protected synchronized void setupIOstreams() throws IOException { start(); } catch (IOException e) { markClosed(e); close(); throw e; } 。 {noformat} 5.thus the call will be lost in call queue and never to complete. {noformat} public Writable call(..) { .. synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ignored) { // save the fact that we were interrupted interrupted = true; } } .. } {noformat} The OOM problem of IPC client call cause all handle block -- Key: HBASE-5673 URL: https://issues.apache.org/jira/browse/HBASE-5673 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90.6 Reporter: xufeng Assignee: xufeng if HBaseClient meet unable to create new native thread exception, the call will never complete because it be lost in calls queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5673) The OOM problem of IPC client call cause all handle block
[ https://issues.apache.org/jira/browse/HBASE-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241069#comment-13241069 ] xufeng commented on HBASE-5673: --- Step 4 miss some logs info: {noformat} java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:351) {noformat} The OOM problem of IPC client call cause all handle block -- Key: HBASE-5673 URL: https://issues.apache.org/jira/browse/HBASE-5673 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90.6 Reporter: xufeng Assignee: xufeng if HBaseClient meet unable to create new native thread exception, the call will never complete because it be lost in calls queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238029#comment-13238029 ] xufeng commented on HBASE-5615: --- Thanks for help Ramkrishna,Jinchao and Ted. the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236464#comment-13236464 ] xufeng commented on HBASE-5615: --- reproduce this issue by 0.90 In this issue,META should hold parent region info for long time.So before test,I delete those code in regionserver class: {noformat} public void postOpenDeployTasks(final HRegion r, final CatalogTracker ct, final boolean daughter) throws KeeperException, IOException { // Do checks to see if we need to compact (references or too many files) /*if (r.hasReferences() || r.hasTooManyStoreFiles()) { getCompactionRequester().requestCompaction(r, r.hasReferences()? Region has references on open : Region has too many store files); }*/ {noformat} step1:start cluster that has two master and one regionerver process. step2:create a table and input some data in it. step3:split the table by shell. step4:kill the active master. step5:after backup master become active one,start another regionserver process. result:the issue happen I also test my patch many times and it can work. the master never do balance becauseof balance the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Attachments: HBASE-5615.patch the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235426#comment-13235426 ] xufeng commented on HBASE-5615: --- In my cluster I found this issue. 1.the balance never be executed because: {noformat} [2012-03-21 14:11:47,226] [DEBUG] [158-1-131-48:2-BalancerChore] [org.apache.hadoop.hbase.master.HMaster 824] Not running balancer because 4 region(s) in transition: {3139250177b9c55fbce6856e2595b272=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272. state=PENDING_CLOSE, ts=1332339058374, 3d7698062c1ffaa288ffa4b0630205dd=hbaseTable,12284#51,1332214163915.3d7698062c1ffaa288ffa4b0630205dd. st... {noformat} 2.choose the 3139250177b9c55fbce6856e2595b272 as a sample to track. I found it has be splited: {noformat} [2012-03-20 23:40:36,496] [INFO ] [regionserver20020.compactor] [org.apache.hadoop.hbase.regionserver.HRegion 563] Closed hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272. [2012-03-20 23:40:38,469] [INFO ] [regionserver20020.compactor] [org.apache.hadoop.hbase.catalog.MetaEditor 85] Offlined parent region hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272. in META [2012-03-20 23:40:39,755] [INFO ] [regionserver20020.compactor] [org.apache.hadoop.hbase.regionserver.CompactSplitThread 181] Region split, META updated, and report to master. Parent=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272., new regions: hbaseTable3,06640#000149,1332286834610.bf8baeae598db2a1e87dbd0a234d1539., hbaseTable3,06723#000707,1332286834610.64ccaffa46be50a5dbc41540006afcb6.. Split took 5sec {noformat} 3.then the backup master active one, in finishInitialization() logs,I found those logs: [2012-03-21 11:41:46,692] [DEBUG] [master-158-1-131-48:2] [org.apache.hadoop.hbase.master.handler.ServerShutdownHandler 348] Daughter hbaseTable3,06640#000149,1332286834610.bf8baeae598db2a1e87dbd0a234d1539. present 4.so I ensure that the parent region(3139250177b9c55fbce6856e2595b272) also in META table. 5.if 3139250177b9c55fbce6856e2595b272 in META, it will be added to AssignmentManager#regions and AssignmentManager#servers when master rebuild the user regions. 6.balance will reference to AssignmentManager#servers to let the 3139250177b9c55fbce6856e2595b272 to move: {noformat} [2012-03-21 11:46:47,699] [INFO ] [158-1-131-48:2-BalancerChore] [org.apache.hadoop.hbase.master.HMaster 849] balance hri=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272., src=158-1-131-48,20020,1331918756600, dest=158-1-130-11,20020,1331918756573 {noformat} 7.the parent will in RIT forever as PENDING_CLOSE state,thus balance will never be executed {noformat} [2012-03-21 13:13:57,201] [WARN ] [PRI IPC Server handler 3 on 20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 2211] Received close for region we are not serving; 3139250177b9c55fbce6856e2595b272 {noformat} {noformat} [2012-03-21 11:55:55,638] [INFO ] [158-1-131-48:2.timeoutMonitor] [org.apache.hadoop.hbase.master.AssignmentManager 2327] Regions in transition timed out: hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272. state=PENDING_CLOSE, ts=1332330775586 [2012-03-21 11:55:55,639] [INFO ] [158-1-131-48:2.timeoutMonitor] [org.apache.hadoop.hbase.master.AssignmentManager 2363] Region has been PENDING_CLOSE for too long, running forced unassign again on region=hbaseTable3,06640#000149,1332230348477.3139250177b9c55fbce6856e2595b272. {noformat} the master never do balance becauseof balance the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Reporter: xufeng Assignee: xufeng Priority: Critical the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235432#comment-13235432 ] xufeng commented on HBASE-5615: --- I use the 0.90 BTW:I can not compile the 0.90 branch on location by maven.is this a problem? the error log is: {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project hbase: Compilation failure [ERROR] /opt/xufeng/module/hbase/host_java/src/HBASE_ONLINE/src/main/java/org/apache/hadoop/hbase/master/HMaster.java:[1121,22] cannot find symbol [ERROR] symbol : class ServerName [ERROR] location: class org.apache.hadoop.hbase.master.HMaster {noformat} the master never do balance becauseof balance the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Reporter: xufeng Assignee: xufeng Priority: Critical the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235541#comment-13235541 ] xufeng commented on HBASE-5615: --- the log of step2 from 158-1-131-48,20020,1331918756600 the master never do balance becauseof balance the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Attachments: HBASE-5615.patch the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168051#comment-13168051 ] xufeng commented on HBASE-4951: --- I tested this patch in 0.90. It can not work in following scenarios: 1.master startup,one regionserver startup. 2.waitForRegionServers over and ok. 3.run the bin/hbase master stop before root region be assigned. the bin/hbase master stop will stop the cluster,the regionserver will been killed first. The root region has no chance to be assigned successfully,it will block in catalogTracker.waitForRoot(). master process can not be stopped when it is initializing - Key: HBASE-4951 URL: https://issues.apache.org/jira/browse/HBASE-4951 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: xufeng Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.6 Attachments: HBASE-4951.patch It is easy to reproduce by following step: step1:start master process.(do not start regionserver process in the cluster). the master will wait the regionserver to check in: org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin step2:stop the master by sh command bin/hbase master stop result:the master process will never die because catalogTracker.waitForRoot() method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13168054#comment-13168054 ] xufeng commented on HBASE-4951: --- I think this problem is also exist in trunk by this patch. master process can not be stopped when it is initializing - Key: HBASE-4951 URL: https://issues.apache.org/jira/browse/HBASE-4951 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: xufeng Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.6 Attachments: HBASE-4951.patch It is easy to reproduce by following step: step1:start master process.(do not start regionserver process in the cluster). the master will wait the regionserver to check in: org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin step2:stop the master by sh command bin/hbase master stop result:the master process will never die because catalogTracker.waitForRoot() method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165916#comment-13165916 ] xufeng commented on HBASE-4951: --- @ramkrishna thanks. do you think we should fix it in 0.90. I try to create a path. master process can not be stopped when it is initializing - Key: HBASE-4951 URL: https://issues.apache.org/jira/browse/HBASE-4951 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: xufeng Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.0, 0.90.5 It is easy to reproduce by following step: step1:start master process.(do not start regionserver process in the cluster). the master will wait the regionserver to check in: org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin step2:stop the master by sh command bin/hbase master stop result:the master process will never die because catalogTracker.waitForRoot() method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4773) HBaseAdmin leaks ZooKeeper connections
[ https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158293#comment-13158293 ] xufeng commented on HBASE-4773: --- @Ted yes,I have run patch for TRUNK through unit test suite in my env. HBaseAdmin leaks ZooKeeper connections -- Key: HBASE-4773 URL: https://issues.apache.org/jira/browse/HBASE-4773 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Priority: Critical Fix For: 0.90.5 Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch When master crashs, HBaseAdmin will leaks ZooKeeper connections I think we should close the zk connetion when throw MasterNotRunningException public HBaseAdmin(Configuration c) throws MasterNotRunningException, ZooKeeperConnectionException { this.conf = HBaseConfiguration.create(c); this.connection = HConnectionManager.getConnection(this.conf); this.pause = this.conf.getLong(hbase.client.pause, 1000); this.numRetries = this.conf.getInt(hbase.client.retries.number, 10); this.retryLongerMultiplier = this.conf.getInt(hbase.client.retries.longer.multiplier, 10); //we should add this code and close the zk connection try{ this.connection.getMaster(); }catch(MasterNotRunningException e){ HConnectionManager.deleteConnection(conf, false); throw e; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4773) HBaseAdmin leaks ZooKeeper connections
[ https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157024#comment-13157024 ] xufeng commented on HBASE-4773: --- yes, I have tested it in my cluster. Here is my client test code: {noformat} . static void initHBase() throws ZooKeeperConnectionException { HBaseAdmin hbaseAdmin = null; Configuration config = HBaseConfiguration.create(); config.set(hbase.zookeeper.quorum, 158.1.130.31,158.1.130.32,158.1.130.33); config.set(hbase.zookeeper.property.clientPort, 2181); try { hbaseAdmin = new HBaseAdmin(config); System.out.println(init sucess!); } catch (MasterNotRunningException e) { e.printStackTrace(); initHBase(); } catch (ZooKeeperConnectionException e) { e.printStackTrace(); initHBase(); } } } . {noformat} In my cluster I did not start HBase process. Run test,result of the lsof commondline is: {noformat} java 16735 root 72w REG 253,3 890569 524379 /opt/xf/hadoop.log java 16735 root 73w REG 253,3 274338 524376 /opt/xf/HA_hadoop.log java 16735 root 74r FIFO0,8 0t0 110645029 pipe java 16735 root 75w FIFO0,8 0t0 110645029 pipe java 16735 root 76u 0,90 21 anon_inode java 16735 root 77u IPv6 110645030 0t0 TCP C3S31:35186-C3S33:eforward (ESTABLISHED) java 16735 root 78u unix 0x8800cba90380 0t0 110645035 socket java 16735 root 79u sock0,6 0t0 110645032 can't identify protocol java 16735 root 80r FIFO0,8 0t0 110645037 pipe java 16735 root 81w FIFO0,8 0t0 110645037 pipe java 16735 root 82u 0,90 21 anon_inode java 16735 root 83u IPv6 110645038 0t0 TCP C3S31:53727-C3S31:eforward (ESTABLISHED) java 16735 root 84r FIFO0,8 0t0 110645043 pipe java 16735 root 85w FIFO0,8 0t0 110645043 pipe java 16735 root 86u 0,90 21 anon_inode java 16735 root 87u IPv6 110645044 0t0 TCP C3S31:53728-C3S31:eforward (ESTABLISHED) java 16735 root 88r FIFO0,8 0t0 110645047 pipe java 16735 root 89w FIFO0,8 0t0 110645047 pipe java 16735 root 90u 0,90 21 anon_inode java 16735 root 91u IPv6 110645048 0t0 TCP C3S31:47183-C3S32:eforward (ESTABLISHED) java 16735 root 92r FIFO0,8 0t0 110645050 pipe java 16735 root 93w FIFO0,8 0t0 110645050 pipe java 16735 root 94u 0,90 21 anon_inode java 16735 root 95u IPv6 110645051 0t0 TCP C3S31:53730-C3S31:eforward (ESTABLISHED) java 16735 root 96r FIFO0,8 0t0 110645135 pipe java 16735 root 97w FIFO0,8 0t0 110645135 pipe java 16735 root 98u 0,90 21 anon_inode java 16735 root 99u IPv6 110645136 0t0 TCP C3S31:49799-C3S31:eforward (ESTABLISHED) java 16735 root 100r FIFO0,8 0t0 110645143 pipe java 16735 root 101w FIFO0,8 0t0 110645143 pipe java 16735 root 102u 0,90 21 anon_inode java 16735 root 103u IPv6 110645144 0t0 TCP C3S31:38931-C3S32:eforward (ESTABLISHED) java 16735 root 104r FIFO0,8 0t0 110645148 pipe java 16735 root 105w FIFO0,8 0t0 110645148 pipe java 16735 root 106u 0,90 21 anon_inode java 16735 root 107u IPv6 110645149 0t0 TCP C3S31:59939-C3S33:eforward (ESTABLISHED) java 16735 root 108r FIFO0,8 0t0 110645507 pipe java 16735 root 109w FIFO0,8 0t0 110645507 pipe java 16735 root 110u 0,90 21 anon_inode java 16735 root 111u IPv6 110645508 0t0 TCP C3S31:59940-C3S33:eforward (ESTABLISHED) {noformat} The [eforward] is port of zookeeper.