Re: Full stack rolling restart

Bryan Beaudreault Thu, 29 May 2014 08:47:45 -0700

Restarting datanodes has been tricky for us from an HBase perspective.  It
always causes a bunch of timeouts and thus blips of service interruption.
Perhaps the above JIRAs will help that, but it's hard to tell.  I think the
only truly safe way to do this with CDH4.x is:

1. Move all regions off regionserver
2. Compact all regions after they have moved off
3. Restart DN and RS.
4. Wait for the DN to register with the namenode which is not always super
fast.
5. Move everything back.
6. Compact again.

This is not great because it will take a while and adds a bunch of
compaction load to your cluster.  Alternatively, it may help to temporarily
add the DN to the exclude hosts file on the NameNode, which may help the
RegionServers not try to read from it (unconfirmed).

I'd love to be wrong about the above, so if anyone has found a safe way to
rolling restart DataNodes without any HBase degradation, I'm all ears.

On Thu, May 29, 2014 at 11:02 AM, Ted Yu <[email protected]> wrote:

> The following JIRAs may be of interest to you:
> HDFS-5535
> HDFS-3867
>
> Cheers
>
>
> On Wed, May 28, 2014 at 11:37 PM, sameerv <[email protected]> wrote:
>
> > I am curious to know what the industry folks think of rolling restart on
> > the
> > full stack. Envisioning something like each node which services it runs,
> > stop all services, use new configs and start all services. Is it feasible
> > to
> > do this ? Can folks who have tried, share their experiences please?
> >
> > Thanks,
> > Sameer
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/Full-stack-rolling-restart-tp4059877.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>

Re: Full stack rolling restart

Reply via email to