Hi Jacob,
Thanks for your response. These are the last lines of the task log, before
killing the process. The machine is 192.168.1.18.
2011-01-26 11:14:32,485 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at
0.00 MB/s)
2011-01-26 11:14:32,623 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at
0.00 MB/s)
2011-01-26 11:14:32,670 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at
0.00 MB/s)
2011-01-26 11:14:32,814 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
2011-01-26 11:14:32,971 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000003_0 0.6666667% reduce > reduce
2011-01-26 11:14:33,639 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000000_0 0.6666667% reduce > reduce
2011-01-26 11:14:33,966 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
17393 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000001_0 given
17393/17389
2011-01-26 11:14:33,966 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 17393, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000001_0, duration: 2439516
2011-01-26 11:14:34,065 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
8246 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000004_0 given
8246/8242
2011-01-26 11:14:34,066 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 8246, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000004_0, duration: 1218536
2011-01-26 11:14:34,084 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
11992 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000006_0 given
11992/11988
2011-01-26 11:14:34,085 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 11992, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000006_0, duration: 1474773
2011-01-26 11:14:34,092 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
7195 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000008_0 given
7195/7191
2011-01-26 11:14:34,092 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 7195, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000008_0, duration: 1679538
2011-01-26 11:14:34,113 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
13086 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000001_0 given
13086/13082
2011-01-26 11:14:34,113 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 13086, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000001_0, duration: 1596605
2011-01-26 11:14:34,288 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
15422 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000004_0 given
15422/15418
2011-01-26 11:14:34,288 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 15422, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000004_0, duration: 1191562
2011-01-26 11:14:34,310 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
8648 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000006_0 given
8648/8644
2011-01-26 11:14:34,311 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 8648, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000006_0, duration: 1486513
2011-01-26 11:14:34,351 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
8181 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000008_0 given
8181/8177
2011-01-26 11:14:34,352 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace:
src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 8181, op:
MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000008_0, duration: 1530920
2011-01-26 11:14:35,617 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce
2011-01-26 11:14:36,083 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
2011-01-26 11:14:38,625 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce
It then hangs at 66%.
About the script, is there a way of telling in which part of it the process is
hanging?
Thanks!
> Subject: Re: Tips for debugging pig
> From: jacob.a.perk...@gmail.com
> To: user@pig.apache.org
> Date: Wed, 26 Jan 2011 10:08:08 -0600
>
> Martin,
> When you look at the task logs for the particular reducer that's stuck,
> what do you see? What kind of operations do you have going on in the
> script, possibly a GROUP ALL?
>
> --jacob
>
> On Wed, 2011-01-26 at 12:44 -0300, Martin Z wrote:
> > Hi all,
> >
> > I'm running a Pig script in local mode, and it finishes successfully. When
> > I use the same dataset and script to run pig in its distributed mode, it
> > hangs at 90% and the hadoop processes in the node machines takes almost all
> > the memory. It always hangs at the reduce task of the last job.
> >
> > The conf/mapred-site.xml is:
> >
> > <property>
> > <name>mapred.child.java.opts</name>
> > <value>-Xmx1000m</value>
> > </property>
> > <property>
> > <name>mapred.child.ulimit</name>
> > <value>4000000</value>
> > <final>true</final>
> > </property>
> >
> > Do you know how I can debug the processes to find out where the problem is?
> >
> > Thanks!
> >
>
>