On Fri, May 22, 2015 at 10:12 PM, Stack <st...@duboce.net> wrote: > On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrote: > >> In our system each server has 2 dns associated with it, one always points >> to a private address and the other to public or private depending on the >> context. >> >> This issue did not show up in 0.94.x, but is showing up on my new 1.x >> cluster. Basically it goes like this: >> >> 1. Regionserver starts up, get's its hostname which returns >> `hostA.external` due to our /etc/hosts >> 2. Regionserver registers itself in zookeeper as `hostA.external` >> 3. Regionserver reports for duty in to HMaster, which re-resolves the DNS >> and returns `hostA.internal`. >> 4. HMaster registers server as `hostA.internal` >> 5. Regionserver receives the RegionServerStartupResponse, which contains >> `hostA.internal` and uses that for its RPCs >> 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a >> regionserver that hasn't checked in yet, and registers it. >> >> So I think the problem is that step #2 happens before step #5. You can >> clearly see this in the HRegionServer.java run() function. >> >> > Yes. Looks like a regression. > > commit 10d336a51d3a5a2694f1898e52afa01dc9dc1798 > Author: rajeshbabu <rajeshbabu@unknown> > Date: Thu Oct 24 18:26:42 2013 +0000 > > HBASE-9593 Region server left in online servers list forever if it > went down after registering to master and before creating ephemeral node > > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1535479 > 13f79535-47bb-0310-9956-ffa450edef68 > > Regionserver used to use the name given it by the master registering in zk > and when it heartbeated the master. We arrived at this approach after lots > of pain double registering regionservers because of disagreements in naming > between cluster nodes. Above commit changed the order and seems to have > broken this facility. > > Will open issue to fix.... >
HBASE-13753. St.Ack > St.Ack > > >> In 0.94, the `createMyEphemeralNode` function was called within >> `handleReportForDutyResponse`. In 1.x, it happens within `run()` BEFORE >> `handleReportForDutyResponse`. >> >> >> I can work around this by handling /etc/hosts specially for my >> regionservers. We have our /etc/hosts file set up like this for a reason, >> but I think I can special case regionservers. >> >> However, it seems like a bug that there are mechanisms built in for the >> HMaster to determine the RegionServer hostname, but that these mechanisms >> do not account for doubly-registered regionservers due to zookeeper and >> hmaster mismatch. >> >> I tried to create a JIRA for this, but either my username no longer has >> permissions for creating, or I can't find the place to create them >> anymore. Any help? >> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault >> > >