Hi All,
I am using following test case with mr1+hdfs2, the mapreduce job succeed but
there is no output data file "part-m-00000" is generated. Following is the
detail of the test case and my current investigation. I want to trace this
issue, please give your suggestions. Like which classes or functions I should
pay attention to during debugging. Thanks~
cat $PIG_HOME/bin/test/student
lynn,28,3
ff,22,4
chen,27,5
John,20,4
Mary,25,4
Bill,30,5
Joe,40,4
Run into pig grunt via command "$PIG_HOME/bin/pig":
grunt> copyFromLocal $PIG_HOME/pig/bin/test/student /user/pig/student
grunt> A = load 'student' using PigStorage(',') as (name:chararray, age:int,
gpa:float);
grunt> B = foreach A generate name;
grunt> store B into 'result';
The correct output folder "result" stored at hdfs should be like following:
hadoop fs -ls /user/pig/result
Found 3 items
-rw-r--r-- 2 pig pig 0 2013-07-30 00:52 /user/pig/result/_SUCCESS
drwxr-xr-x - pig pig 0 2013-07-30 00:52 /user/pig/result/_logs
-rw-r--r-- 2 pig pig 23 2013-07-30 00:52 /user/pig/part-m-00000
But in this test case, there is no output data(part-m-00000) stored at hdfs,:
grunt> fs -ls /user/pig/result
Found 2 items
-rw-r--r-- 1 pig pig 0 2013-07-30 01:37 /user/pig/result/_SUCCESS
drwx------ - pig pig 0 2013-07-30 01:37 /user/pig/result/_logs
During running the test case, I can see the output data can be generated at
hdfs:
"/user/pig/result/_temporary/_attempt_201308010000_0008_m_000000_0/part-m-00000".
This "_temporary" file will be deleted at the end of this job. But file
"part-m-00000" is not saved as "/user/biadmin/tmpuser0/part-m-00000" in hdfs
via rename command.