This uses the FileLocalizer. All file references are sent through the
FileLocalizer. If we are doing MAPREDUCE and a file reference starts with
file:, we copy it to a temp file in HDFS before we start the job and use that
temp file as the input or output of the map reduce job.
ben
On Thursday 06 March 2008 04:07:41 pi song wrote:
> Dear pig-dev mailling-list,
>
> I just wanna understand this bit quickly. Below is the code from
> TestMapReduce.java. As you can see the temp file is created in local
> machine but I don't understand how Hadoop MapReduce pick up the file from
> local file system rather than HDFS?
>
> PigServer pig = new PigServer(MAPREDUCE);
> File tmpFile = File.createTempFile("test", ".txt");
> PrintStream ps = new PrintStream(new FileOutputStream(tmpFile));
> for(int i = 0; i < 10; i++) {
> ps.println(i+"\t"+i);
> }
> ps.close();
> String query = "foreach (load 'file:"+tmpFile+"') generate $0,$1;";
> System.out.println(query);
> pig.registerQuery("asdf_id = " + query);
> try {
> pig.deleteFile("frog");
> } catch(Exception e) {}
>
> Cheers,
> Pi