Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1

Shahab Yunus Mon, 10 Jun 2013 06:58:48 -0700

Yeah Kai si right.

You can read more details for your understanding at:


http://hadoop.apache.org/docs/stable/hdfs_design.html#Data+Replication

and right from the horse's mouth (Pgs 70-75):
http://books.google.com/books?id=drbI_aro20oC&pg=PA51&lpg=PA51&dq=hadoop+replication+factor+1+definitive+guide&source=bl&ots=tZDeyhhZj1&sig=Xq-0WrYhOKnER1SDbnBTmbaEfdk&hl=en&sa=X&ei=Dtu1UdnsCcO_rQG8jICoAw&ved=0CE0Q6AEwBA#v=onepage&q=hadoop%20replication%20factor%201%20definitive%20guide&f=false


On Mon, Jun 10, 2013 at 9:47 AM, Kai Voigt <[email protected]> wrote:

> Hello,
>
> Am 10.06.2013 um 15:36 schrieb Razen Al Harbi <[email protected]>:
>
> > I have deployed Hadoop on a cluster of 20 machines. I set the
> replication factor to one. When I put a file (larger than HDFS block size)
> into HDFS, all the blocks are stored on the machine where the Hadoop put
> command is invoked.
> >
> > For higher replication factor, I see the same behavior but the
> replicated blocks are stored randomly on all the other machines.
> >
> > Is this a normal behavior, if not what would be the cause?
>
> Yes, this is normal behavior. When a HDFS client happens to run on a host
> that also is a DataNode (always the case when a reducer writes its output),
> the first copy of a block is stored on that very same node. This is to
> optimize the latency, it's faster to write to a local disk than writing
> across the network.
>
> The second copy of the block gets stored onto a random host in another
> rack (if your cluster is configured to be rack-aware), to increase the
> distribution of the data.
>
> The third copy of the block gets stored onto a random host in that other
> rack.
>
> So your observations are correct.
>
> Kai
>
> --
> Kai Voigt
> [email protected]
>
>
>
>
>

Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1

Reply via email to