Hi, Mix: " second map reduce started executing before first one got completed" Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it, Pig shouldn't do anything, especially before STORE command complete.
I have below script and it works fine. So think root cause is something else. Unless your data is very big? a = load 'words_and_numbers' as (f1:chararray, f2:chararray); b = filter a by f1 is not null; store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp'; c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray); dump c; Johnny On Wed, Mar 27, 2013 at 4:07 PM, Mix Nin <[email protected]> wrote: > I guess the second map reduce started executing before first one got > completed. Below is error log > > 2013-03-27 15:48:08,902 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - creating jar file Job4695026384513564120.jar > 2013-03-27 15:48:13,983 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - jar file Job4695026384513564120.jar created > 2013-03-27 15:48:13,993 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up single store job > 2013-03-27 15:48:14,052 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 2 map-reduce job(s) waiting for submission. > > Failed Jobs: > JobId Alias Feature Message Outputs > N/A 1-18,1-19,FACT_PXP_EVNT_DTL,evnt_dtl GROUP_BY Message: > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input > path does not exist: hdfs:///user/lnindrakrishna/exp/part-r-00000 > > When I run the scripts individually in grunt shell one by one, i don't see > this problem > > > On Wed, Mar 27, 2013 at 3:45 PM, Mix Nin <[email protected]> wrote: > > > yes the file exists in HDFS. > > > > > > On Wed, Mar 27, 2013 at 3:16 PM, Johnny Zhang <[email protected] > >wrote: > > > >> Mix, > >> 'null' is the failed job ID. From what I can tell, there is only one > STORE > >> command and it actually fail, so MapReduceLauncher tries to stop > >> all dependent jobs, that's why the message is trhown. Can you double > check > >> if the file exists in HDFS? > >> > >> Johnny > >> > >> > >> On Wed, Mar 27, 2013 at 2:58 PM, Mix Nin <[email protected]> wrote: > >> > >> > Sorry for posting same issue multiple times > >> > > >> > I wrote a pig script as follows and stored it in x.pig file > >> > > >> > Data = LOAD '/....' as (,,,, ) > >> > NoNullData= FILTER Data by qe is not null; > >> > STORE (foreach (group NoNullData all) generate flatten($1)) into > >> > 'exp/$inputDatePig'; > >> > > >> > > >> > evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,) > >> > > >> > > >> > > >> > I executed the command as follows > >> > > >> > pig -f x.pig -param inputDatePig=03272013 > >> > > >> > > >> > And finally it says exp/03272013 tough the directory exists as it > gets > >> > created in STORE command. > >> > > >> > What is wrong in this > >> > > >> > > >> > This is the error I get > >> > > >> > > >> > > >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - 32% complete > >> > 2013-03-27 14:38:35,568 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - 50% complete > >> > 2013-03-27 14:38:45,731 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - job null has failed! Stop running all dependent jobs > >> > 2013-03-27 14:38:45,731 [main] INFO > >> > > >> > > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > >> > - 100% complete > >> > 2013-03-27 14:38:45,734 [main] ERROR > >> > org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to > >> > recreate exception from backend error: > >> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: > Input > >> > path does not exist: > >> hdfs://user/lnindrakrishna/exp/03272013/part-r-00000 > >> > > >> > > >> > > >> > But when I remove second LOAD command, everything runs fine . Why does > >> it > >> > throw job "null has failed! Stop running all dependent jobs" > >> > > >> > > > > >
