Martin,

It's a bit of a black art at the moment. Every Pig script is broken down
into one or more map-reduce jobs based on the types of operations you've
got in there. JOIN, GROUP, COGROUP, and ORDER will require a reduce
(except for special circumstances that you must specify ahead of time).
If your script only has one operation that kicks off a reduce then it
has to be that operation that's the problem.

At the phase you're looking at (~66%) is where the partition and sort
have finished and you're running the actual reduce portion of your task.
Keep in mind that, at this phase, output data is finally being written
to the hdfs. Do you have write permissions where you're trying to write?
What do the namenode logs say? The datanode logs?


--jacob
 
On Wed, 2011-01-26 at 13:32 -0300, Martin Z wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> Hi Jacob,
> 
> Thanks for your response. These are the last lines of the task log, before 
> killing the process. The machine is 192.168.1.18.
> 
> 2011-01-26 11:14:32,485 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 
> 0.00 MB/s)
> 2011-01-26 11:14:32,623 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 
> 0.00 MB/s)
> 2011-01-26 11:14:32,670 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 
> 0.00 MB/s)
> 2011-01-26 11:14:32,814 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
> 2011-01-26 11:14:32,971 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000003_0 0.6666667% reduce > reduce
> 2011-01-26 11:14:33,639 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000000_0 0.6666667% reduce > reduce
> 2011-01-26 11:14:33,966 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 17393 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000001_0 
> given 17393/17389
> 2011-01-26 11:14:33,966 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.19:56250, bytes: 17393, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000001_0, duration: 2439516
> 2011-01-26 11:14:34,065 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 8246 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000004_0 
> given 8246/8242
> 2011-01-26 11:14:34,066 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.19:56250, bytes: 8246, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000004_0, duration: 1218536
> 2011-01-26 11:14:34,084 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 11992 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000006_0 
> given 11992/11988
> 2011-01-26 11:14:34,085 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.19:56250, bytes: 11992, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000006_0, duration: 1474773
> 2011-01-26 11:14:34,092 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 7195 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000008_0 
> given 7195/7191
> 2011-01-26 11:14:34,092 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.19:56250, bytes: 7195, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000008_0, duration: 1679538
> 2011-01-26 11:14:34,113 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 13086 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000001_0 
> given 13086/13082
> 2011-01-26 11:14:34,113 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.14:60067, bytes: 13086, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000001_0, duration: 1596605
> 2011-01-26 11:14:34,288 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 15422 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000004_0 
> given 15422/15418
> 2011-01-26 11:14:34,288 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.14:60067, bytes: 15422, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000004_0, duration: 1191562
> 2011-01-26 11:14:34,310 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 8648 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000006_0 
> given 8648/8644
> 2011-01-26 11:14:34,311 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.14:60067, bytes: 8648, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000006_0, duration: 1486513
> 2011-01-26 11:14:34,351 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 
> 8181 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000008_0 
> given 8181/8177
> 2011-01-26 11:14:34,352 INFO 
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, 
> dest: 192.168.1.14:60067, bytes: 8181, op: MAPRED_SHUFFLE, cliID: 
> attempt_201101261105_0002_m_000008_0, duration: 1530920
> 2011-01-26 11:14:35,617 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce
> 2011-01-26 11:14:36,083 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
> 2011-01-26 11:14:38,625 INFO org.apache.hadoop.mapred.TaskTracker: 
> attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce
> 
> 
> It then hangs at 66%.
> 
> About the script, is there a way of telling in which part of it the process 
> is hanging?
> 
> 
> Thanks!
> 
> > Subject: Re: Tips for debugging pig
> > From: jacob.a.perk...@gmail.com
> > To: user@pig.apache.org
> > Date: Wed, 26 Jan 2011 10:08:08 -0600
> > 
> > Martin,
> >     When you look at the task logs for the particular reducer that's stuck,
> > what do you see? What kind of operations do you have going on in the
> > script, possibly a GROUP ALL?
> > 
> > --jacob
> > 
> > On Wed, 2011-01-26 at 12:44 -0300, Martin Z wrote:
> > > Hi all,
> > > 
> > > I'm running a Pig script in local mode, and it finishes successfully. 
> > > When I use the same dataset and script to run pig in its distributed 
> > > mode, it hangs at 90% and the hadoop processes in the node machines takes 
> > > almost all the memory. It always hangs at the reduce task of the last job.
> > > 
> > > The conf/mapred-site.xml is:
> > > 
> > >   <property>
> > >     <name>mapred.child.java.opts</name>
> > >     <value>-Xmx1000m</value>
> > >   </property>
> > >   <property>
> > >     <name>mapred.child.ulimit</name>
> > >     <value>4000000</value>
> > >     <final>true</final>
> > >   </property>
> > > 
> > > Do you know how I can debug the processes to find out where the problem 
> > > is?
> > > 
> > > Thanks!
> > >                                     
> > 
> > 
>                                         


Reply via email to