Multi node maintenance for HDFS?

Stephan Hoermann Wed, 15 Jun 2016 19:30:05 -0700

Hi,

How do people do multi node maintenance for HDFS without data loss?


We want to apply the ideas of immutable infrastructure to how we manage our
machines. We prebuild an OS image with the configuration and roll it out to
our nodes. When we have a patch we build a new image and roll that out
again. It takes us about 10 to 15 minutes to do that.

For our data nodes we want to keep the data on a separate partition/disks
so that when we rebuild we rejoin HDFS with the data don't start a
replication storm.

Now in order to scale this and quickly roll out upgrades we can't really do
a one node at a time upgrade so we need to be able to take out a percentage
of the nodes at a time. Ideally we would like to do this while keeping the
replication count of each block at 2 (so we can still handle failure while
we are doing an upgrade) and without starting a replication strategy.

Right now it doesn't look like that is really supported. Is anyone else
doing multi node upgrades and how do you solve these problems?

We are considering changing the replication strategy so that we divide all
our nodes into 3 evenly sized buckets and at maintenance remove a subset
from one bucket at a time. Does anyone have experience with doing something
similar?

Regards,

Stephan

Multi node maintenance for HDFS?

Reply via email to