Re: join operation fails on big data set

Cheolsoo Park Fri, 12 Apr 2013 11:25:37 -0700

Hi Mua,

I guess you misunderstood me. The pig_****.log file is not a task log.


You should look for task logs on data nodes where your task tracker ran.
Here is some explanation regarding various log files in Hadoop and where to
find them:

http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/

Thanks,
Cheolsoo


On Fri, Apr 12, 2013 at 10:27 AM, Mua Ban <[email protected]> wrote:

> Thank you very much for your reply.
>
> Below is the stack log in the pig_****.log file
>
> Can you please give me some suggestions?
>
> -Mua
> ------------------
> Backend error message
> ---------------------
> Task attempt_201304081613_0048_r_000001_0 failed to report status for 601
> seconds. Killing!
>
> Pig Stack Trace
> ---------------
> ERROR 2997: Unable to recreate exception from backed error: Task
> attempt_201304081613_0048_r_000001_0 failed to report status for 601
> seconds. Killing!
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to
> recreate exception from backed error: Task
> attempt_201304081613_0048_r_000001_0 failed to report status for 601
> seconds. Killing!
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:217)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:152)
>         at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:383)
>         at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)
>         at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)
>         at org.apache.pig.PigServer.execute(PigServer.java:1245)
>         at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
>         at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>         at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>         at org.apache.pig.Main.run(Main.java:555)
>         at org.apache.pig.Main.main(Main.java:111)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>         at java.lang.reflect.Method.invoke(Method.java:611)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> ================================================================================
>
>
>
>
>
> On Fri, Apr 12, 2013 at 11:29 AM, Cheolsoo Park <[email protected]
> >wrote:
>
> > Did you look at task logs to see why those tasks failed? Since it's a
> > back-end error, the console output doesn't tell you much. Task logs
> should
> > have a stack trace that shows why it failed, and you can go from there.
> >
> >
> >
> > On Fri, Apr 12, 2013 at 8:18 AM, Mua Ban <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I am very new to PIG/Hadoop, I just started writing my first PIG
> script a
> > > couple days ago. I ran into this problem.
> > >
> > > My cluster has 9 nodes. I have to join two data sets big and small,
> each
> > is
> > > collected for 4 weeks. I first take two subsets of my data set (which
> is
> > > for the first week of data), let's call them B1 and S1 for big and
> small
> > > data sets of the first week. The entire data sets of 4 weeks is B4 and
> > S4.
> > >
> > > I ran my script on my cluster to join B1 and S1 and everything is
> fine. I
> > > got my joined data. However, when I ran my script to join B4 and S4,
> the
> > > script failed. B4 is 39GB, S4 is 300MB. B4 is skewed, some id appears
> > more
> > > frequently than others. I tried both 'using skewed' and 'using
> > replicated'
> > > modes for the join operation (by appending them to the end of the below
> > > join clause), they both fail.
> > >
> > > Here is my script and i think it is very simple:
> > >
> > > *big = load 'bigdir/' using PigStorage(',') as (id:chararray,
> > > data:chararray);*
> > > *small = load 'smalldir/' using PigStorage(',') as
> > > (t1:double,t2:double,data:chararray,id:chararray);
> > > *
> > > *J = JOIN big by id LEFT OUTER, small by id;
> > > *
> > > *store J into 'outputdir' using PigStorage(',');
> > > *
> > >
> > > On the web ui of the tracker, I see that the job has 40 reducers (I
> guess
> > > since the total data is about 40GB, and each 1GB will need one reducer
> by
> > > default of PIG and hadoop setting, so this is normal). If I use
> 'parallel
> > > 80' in the join operation above, then I see 80 reducers, and the join
> > > operation still failed.
> > >
> > > I checked file  mapred-default.xml and found this:
> > > <name>mapred.reduce.tasks</name>
> > >   <value>1</value>
> > >
> > > If I set the value of parallel in join operation, it should overwrite
> > this,
> > > right?
> > >
> > >
> > > On the tracker GUI, I see that for different runs, the number of
> > completed
> > > reducers changes from 4 to 10 (out of 40 total reducers). The tracker
> GUI
> > > shows the reason for the failed reducers: "Task
> > > attempt_201304081613_0046_r_000006_0 failed to report status for 600
> > > seconds. Killing!"
> > >
> > > *Could you please help?*
> > > Thank you very much,
> > > -Mua
> > >
> > >
> > >
> >
> --------------------------------------------------------------------------------------------------------------
> > > Here is the error report from the console screen where I ran this
> script:
> > >
> > > job_201304081613_0032   616     0       230     12      32      0   0
> > > 0       big     MAP_ONLY
> > > job_201304081613_0033   705     1       21      6       6       234 2
> > > 34      234             SAMPLER
> > >
> > > Failed Jobs:
> > > JobId   Alias   Feature Message Outputs
> > > job_201304081613_0034   small   SKEWED_JOIN     Message: Job failed!
> > > Error - # of failed Reduce Tasks exceeded allowed limit. FailedCount:
> 1.
> > > LastFailedTask: task_201304081613_0034_r_000012
> > >
> > > Input(s):
> > > Successfully read 364285458 records (39528533645 bytes) from:
> > > "hdfs://d0521b01:24990/user/abc/big/"
> > > Failed to read data from "hdfs://d0521b01:24990/user/abc/small/"
> > >
> > > Output(s):
> > >
> > > Counters:
> > > Total records written : 0
> > > Total bytes written : 0
> > > Spillable Memory Manager spill count : 0
> > > Total bags proactively spilled: 0
> > > Total records proactively spilled: 0
> > >
> > > Job DAG:
> > > job_201304081613_0032   ->      job_201304081613_0033,
> > > job_201304081613_0033   ->      job_201304081613_0034,
> > > job_201304081613_0034   ->      null,
> > > null
> > >
> > >
> > > 2013-04-10 20:11:23,815 [main] WARN
> > >
> > >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - Encountered Warning
> > > REDUCER_COUNT_LOW 1 time(s).
> > > 2013-04-10 20:11:23,815 [main] INFO
> > >
> > >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - Some jobs have faile
> > > d! Stop running all dependent jobs
> > > 2013-04-10 20:11:23,815 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser
> > > - ERROR 2997: Encountered IOException. java.io.IOException: Er
> > > ror Recovery for block blk_312487981794332936_26563 failed  because
> > > recovery from primary datanode 10.6.25.31:54563 failed 6 times.  Pipel
> > > ine was 10.6.25.31:54563. Aborting...
> > > Details at logfile:
> /homes/abc/pig-flatten/scripts/pig_1365627648226.log
> > > 2013-04-10 20:11:23,818 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser
> > > - ERROR 2244: Job failed, hadoop does not return any error mes
> > > sage
> > > Details at logfile:
> /homes/abc/pig-flatten/scripts/pig_1365627648226.log
> > >
> >
>

Re: join operation fails on big data set

Reply via email to