Re: failed to report status for 601 seconds

zaki rahaman Thu, 13 May 2010 15:18:49 -0700

Hi Corbin,

The timeout error you're seeing could also indicate that your reducer is
trying to process a very large key/group which may be the reason for the
timeout in the first place. At least this is a behavior I've seen in the
past.


On Thu, May 13, 2010 at 6:10 PM, Corbin Hoenes <[email protected]> wrote:

> Okay so what is the pig way to do this?
>
> Noticed a lot of chatter about UDFs in pig don't call progress and can
> cause your jobs to get killed.  I am using only builtin UDFs like COUNT,
> FLATTEN do they suffer from this same issue (no progress calls?)
>
> On May 12, 2010, at 2:56 AM, Andrey Stepachev wrote:
>
> > You should report progress in a period less then configured (in you case
> > 600sec).
> > Add code like below to you reducer and call ping in you reducer where you
> > process tuples.
> >
> >        final TaskAttemptContext context = <init in costructor>;
> >        long lastTime = System.currentTimeMillis();
> >
> >        public void ping() {
> >            final long currtime = System.currentTimeMillis();
> >            if (currtime - lastTime > 10000) {
> >                context.progress();
> >                lastTime = currtime;
> >            }
> >        }
> >
> >
> > 2010/5/11 Corbin Hoenes <[email protected]>
> >
> >> Not sure I am clean on how I can debug stuff on a cluster.  I currently
> >> have a long running reducer that attempts to run 4 times before finally
> >> giving up
> >>
> >> I get 4 of these: Task attempt_201005101345_0052_r_000012_0 failed to
> >> report status for 601 seconds. Killing!
> >>
> >> before it gives up...on the last try I noticed this in the log:
> >> ERROR: org.apache.hadoop.hdfs.DFSClient - Exception closing file
> >>
> /tmp/temp1925356068/tmp1003826561/_temporary/_attempt_201005101345_0052_r_000012_4/abs/tmp/temp1925356068/tmp-197182389/part-00012
> >> : org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
> >> complete write to file
> >>
> /tmp/temp1925356068/tmp1003826561/_temporary/_attempt_201005101345_0052_r_000012_4/abs/tmp/temp1925356068/tmp-197182389/part-00012
> >> by DFSClient_attempt_201005101345_0052_r_000012_4
> >>       at
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:497)
> >>       at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
> >>       at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>       at java.lang.reflect.Method.invoke(Method.java:597)
> >>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> >>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
> >>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
> >>       at java.security.AccessController.doPrivileged(Native Method)
> >>       at javax.security.auth.Subject.doAs(Subject.java:396)
> >>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
> >> How do I turn on log4j's DEBUG statements?  Hoping those will help me
> >> pinpoint what is going on here--maybe it's the cluster or maybe the
> script.
> >>
> >>
> >>
>
>


-- 
Zaki Rahaman

Re: failed to report status for 601 seconds

Reply via email to