HDFS balance

Georgi Ivanov Wed, 03 Sep 2014 00:56:56 -0700

Hi,
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
node1. (plain hadoop fs -put)


This way node1 is getting full because the first replica is always
stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job
on random host.
I don't really like this solution as it involves some NFS mounts,
security issues etc.

Is there any better solution ?

Thanks in advance.
George

HDFS balance

Reply via email to