Hi, Yes if you are purely a regular client (non DN box) writing to HDFS, then the chosen DNs are selected at random (but fit within policy of cross-rack writes, if it applies to your environment).
On Wed, Oct 31, 2012 at 6:43 AM, Mohit Anchlia <[email protected]> wrote: > Thanks and if it is not the datanode then I am guessing namenode decides the > nodes in replication pipeline? > > > On Tue, Oct 30, 2012 at 5:36 PM, ranjith raghunath > <[email protected]> wrote: >> >> If your client node is a datanode with your cluster then the first copy >> does get written to that data node. >> >> Experts please feel free to correct me here. >> >> On Oct 30, 2012 7:11 PM, "Mohit Anchlia" <[email protected]> wrote: >>> >>> With respect to replication if I run pig job from one of the nodes within >>> the Hadoop cluster then do I always end up with writing 1 replica copy to >>> that client node always and remaining 2 replica copies to other nodes? >>> > > -- Harsh J
