You could use the following facts. 1. Files are stored in blocks. So make your blocksize bigger than the largest file. 2, The first split is stored on the localnode.
Raj >________________________________ > From: jeremy p <[email protected]> >To: [email protected] >Sent: Tuesday, April 9, 2013 1:49 PM >Subject: When copying a file to HDFS, how to control what nodes that file will >reside on? > > >Hey all, > > >I'm dealing with kind of a bizarre use case where I need to make sure that >File A is local to Machine A, File B is local to Machine B, etc. When copying >a file to HDFS, is there a way to control which machines that file will reside >on? I know that any given file will be replicated across three machines, but >I need to be able to say "File A will DEFINITELY exist on Machine A". I don't >really care about the other two machines -- they could be any machines on my >cluster. > > >Thank you. > >
