In 1.x
The exclude* configuration list will allow you to fine tune which node does 
processing or storage or both  (Process vs Storage node)

This will work for "dynamic sizing" of process nodes.
It does not work well for "dynamically sizing" your storage nodes. As you have 
already discovered or known.

Cheers

P.S. Check your EC2 bill.  You'r gonna be reading a lot of data across with 
your model




From: Robin Verlangen [mailto:[email protected]]
Sent: Friday, August 17, 2012 2:54 AM
To: [email protected]
Subject: HDFS disable balancing cluster

Hi there,

We currently run an eight node cluster on Amazon EC2. This is perfect for our 
storage, but we want to add a couple of nodes (lets say 32) for processing a 
big task. We spin them up, run the jobs, and terminate the machines.

Sounds OK to me, however I'm aware of the fact that hadoop tries to replicate 
data blocks to other nodes in favor of balancing the cluster. I don't want 
this, as I will get under-replicated blocks when terminating the machines.

We use juju for easy cluster administration. This implies that adding a new 
hadoop-slave runs both hdfs and hadoop (mapred).

My main question is, is it possible to disable balancing the cluster, or just 
to disable the datanode service on the new nodes (meant for processing only)?


Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E [email protected]<mailto:[email protected]>

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

Reply via email to