When added back, with blocks retained, the NN would detect that the affected files have over-replicated conditions, and will suitably delete any excess replicas while still adhering to the block placement policy (for rack-aware clusters), but not necessarily everything from the re-added DN will be erased.
This is an automatic process and should not worry you in any way, as an operator. On Wed, Nov 28, 2012 at 8:52 PM, Mark Kerzner <[email protected]> wrote: > What happens if I stop the datanode, miss the 10 min 30 seconds deadline, > and restart the datanode say 30 minutes later? Will Hadoop re-use the data > on this datanode, balancing it with HDFS? What happens to those blocks that > correspond to file that have been updated meanwhile? > > Mark > > On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <[email protected]> > wrote: >> >> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 >> seconds by default. >> >> Right now there isn't a good way to replace a disk out from under a >> running datanode, so the best way is: >> - Stop the DN >> - Replace the disk >> - Restart the DN >> >> >> >> >> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <[email protected]> >> wrote: >>> >>> Hi, >>> >>> can I remove one hard drive from a slave but tell Hadoop not to replicate >>> missing blocks for a few minutes, because I will return it back? Or will >>> this not work at all, and will Hadoop continue replicating, since I removed >>> blocks, even for a short time? >>> >>> Thank you. Sincerely, >>> Mark >> >> > -- Harsh J
