Re: failed to report status for 601 seconds

Andrey Stepachev Wed, 12 May 2010 01:57:27 -0700

You should report progress in a period less then configured (in you case
600sec).
Add code like below to you reducer and call ping in you reducer where you
process tuples.


        final TaskAttemptContext context = <init in costructor>;
        long lastTime = System.currentTimeMillis();

        public void ping() {
            final long currtime = System.currentTimeMillis();
            if (currtime - lastTime > 10000) {
                context.progress();
                lastTime = currtime;
            }
        }


2010/5/11 Corbin Hoenes <[email protected]>

> Not sure I am clean on how I can debug stuff on a cluster.  I currently
> have a long running reducer that attempts to run 4 times before finally
> giving up
>
> I get 4 of these: Task attempt_201005101345_0052_r_000012_0 failed to
> report status for 601 seconds. Killing!
>
> before it gives up...on the last try I noticed this in the log:
> ERROR: org.apache.hadoop.hdfs.DFSClient - Exception closing file
> /tmp/temp1925356068/tmp1003826561/_temporary/_attempt_201005101345_0052_r_000012_4/abs/tmp/temp1925356068/tmp-197182389/part-00012
> : org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
> complete write to file
> /tmp/temp1925356068/tmp1003826561/_temporary/_attempt_201005101345_0052_r_000012_4/abs/tmp/temp1925356068/tmp-197182389/part-00012
> by DFSClient_attempt_201005101345_0052_r_000012_4
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:497)
>        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
> How do I turn on log4j's DEBUG statements?  Hoping those will help me
> pinpoint what is going on here--maybe it's the cluster or maybe the script.
>
>
>

Re: failed to report status for 601 seconds

Reply via email to