On Fri, May 22, 2015 at 10:12 PM, Stack <st...@duboce.net> wrote:

> On Fri, May 22, 2015 at 10:17 AM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
>> In our system each server has 2 dns associated with it, one always points
>> to a private address and the other to public or private depending on the
>> context.
>>
>> This issue did not show up in 0.94.x, but is showing up on my new 1.x
>> cluster.  Basically it goes like this:
>>
>> 1. Regionserver starts up, get's its hostname which returns
>> `hostA.external` due to our /etc/hosts
>> 2. Regionserver registers itself in zookeeper as `hostA.external`
>> 3. Regionserver reports for duty in to HMaster, which re-resolves the DNS
>> and returns `hostA.internal`.
>> 4. HMaster registers server as `hostA.internal`
>> 5. Regionserver receives the RegionServerStartupResponse, which contains
>> `hostA.internal` and uses that for its RPCs
>> 6. HMaster sees a ZNode with `hostA.external`, so thinks it is a
>> regionserver that hasn't checked in yet, and registers it.
>>
>> So I think the problem is that step #2 happens before step #5.  You can
>> clearly see this in the HRegionServer.java run() function.
>>
>>
> Yes. Looks like a regression.
>
> commit 10d336a51d3a5a2694f1898e52afa01dc9dc1798
> Author: rajeshbabu <rajeshbabu@unknown>
> Date:   Thu Oct 24 18:26:42 2013 +0000
>
>     HBASE-9593 Region server left in online servers list forever if it
> went down after registering to master and before creating ephemeral node
>
>     git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1535479
> 13f79535-47bb-0310-9956-ffa450edef68
>
> Regionserver used to use the name given it by the master registering in zk
> and when it heartbeated the master. We arrived at this approach after lots
> of pain double registering regionservers because of disagreements in naming
> between cluster nodes. Above commit changed the order and seems to have
> broken this facility.
>
> Will open issue to fix....
>

HBASE-13753.
St.Ack



> St.Ack
>
>
>> In 0.94, the `createMyEphemeralNode` function was called within
>> `handleReportForDutyResponse`.  In 1.x, it happens within `run()` BEFORE
>> `handleReportForDutyResponse`.
>>
>>
>> I can work around this by handling /etc/hosts specially for my
>> regionservers.  We have our /etc/hosts file set up like this for a reason,
>> but I think I can special case regionservers.
>>
>> However, it seems like a bug that there are mechanisms built in for the
>> HMaster to determine the RegionServer hostname, but that these mechanisms
>> do not account for doubly-registered regionservers due to zookeeper and
>> hmaster mismatch.
>>
>> I tried to create a JIRA for this, but either my username no longer has
>> permissions for creating, or I can't find the place to create them
>> anymore.  Any help?
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bbeaudreault
>>
>
>

Reply via email to