That could be done easily when the server checks in by looking at the
given start code. In ServerManager we already do:

    HServerInfo info = new HServerInfo(serverInfo);
    checkIsDead(info.getServerName(), "STARTUP");
    checkAlreadySameHostPort(info);
    recordNewServer(info, false, null);

A new check in there would fit nicely. Can you open a jira Jeff?

Thx!

J-D

On Thu, Oct 28, 2010 at 9:56 AM, Jeff Whiting <[email protected]> wrote:
> We recently had a problem where one of our machines in the cluster had a
> time that was 6 hours behind the other ones (ntp was supposed to be setup on
> that machine but wasn't).  We subsequently restarted our cluster and the
> '-ROOT-' table was assigned to that machine.  The problem was that when it
> tried to update the value (info:server) for who was holding the '.META.'
> table the value wasn't updating and stayed set as the previous machine. I'm
> pretty sure the problem was the timestamp for the new server was older than
> the timestamp for the previous server preventing the value from updating
> correctly.  Having the incorrect info:server in the ROOT table basically
> made the cluster unusable.
>
> So my question is, would it make sense to have a sanity time check when a
> region server joins the cluster?  Basically when the region server joins it
> would sent its current time and the master would check that time against its
> current time and if difference is too large then it would prevent the region
> server from joining.  I know this is basic server configuration stuff but
> because of human error these things happen and seem like they can cause
> major problems for the cluster if the servers times aren't synchronized.
>
> ~Jeff
>
> --
>
> Jeff Whiting
> Qualtrics Senior Software Engineer
> [email protected]
>
>

Reply via email to