I think that issue is correct as we are no longer seeing those kinds of errors. We reconfigured the cluster a bit.
I am trying: pig -Dmapred.task.timeout=0 ./myscript.pig but it doesn't like setting properties like this it thinks it's invalid and spews the pig "usage" text at me; using pig 0.6 is this still supported? On May 13, 2010, at 6:39 PM, Ashutosh Chauhan wrote: > Yes, you can set job conf properties like that, but looking at your > original stack trace.. it seems that wont solve your problem. Looks > like your cluster is in trouble. If you can, try restarting it. If > not, you need to dig into tasktracker / datanode logs. > > Ashutosh > > On Thu, May 13, 2010 at 16:07, Corbin Hoenes <[email protected]> wrote: >> Zaki, >> >> can pig take command lines like this to set job conf properties? >> >> pig -Dmapred.task.timeout=0 >> >> On May 13, 2010, at 4:18 PM, zaki rahaman wrote: >> >>> Hi Corbin, >>> >>> The timeout error you're seeing could also indicate that your reducer is >>> trying to process a very large key/group which may be the reason for the >>> timeout in the first place. At least this is a behavior I've seen in the >>> past. >>> >>> On Thu, May 13, 2010 at 6:10 PM, Corbin Hoenes <[email protected]> wrote: >>> >>>> Okay so what is the pig way to do this? >>>> >>>> Noticed a lot of chatter about UDFs in pig don't call progress and can >>>> cause your jobs to get killed. I am using only builtin UDFs like COUNT, >>>> FLATTEN do they suffer from this same issue (no progress calls?) >>>> >>>> On May 12, 2010, at 2:56 AM, Andrey Stepachev wrote: >>>> >>>>> You should report progress in a period less then configured (in you case >>>>> 600sec). >>>>> Add code like below to you reducer and call ping in you reducer where you >>>>> process tuples. >>>>> >>>>> final TaskAttemptContext context = <init in costructor>; >>>>> long lastTime = System.currentTimeMillis(); >>>>> >>>>> public void ping() { >>>>> final long currtime = System.currentTimeMillis(); >>>>> if (currtime - lastTime > 10000) { >>>>> context.progress(); >>>>> lastTime = currtime; >>>>> } >>>>> } >>>>> >>>>> >>>>> 2010/5/11 Corbin Hoenes <[email protected]> >>>>> >>>>>> Not sure I am clean on how I can debug stuff on a cluster. I currently >>>>>> have a long running reducer that attempts to run 4 times before finally >>>>>> giving up >>>>>> >>>>>> I get 4 of these: Task attempt_201005101345_0052_r_000012_0 failed to >>>>>> report status for 601 seconds. Killing! >>>>>> >>>>>> before it gives up...on the last try I noticed this in the log: >>>>>> ERROR: org.apache.hadoop.hdfs.DFSClient - Exception closing file >>>>>> >>>> /tmp/temp1925356068/tmp1003826561/_temporary/_attempt_201005101345_0052_r_000012_4/abs/tmp/temp1925356068/tmp-197182389/part-00012 >>>>>> : org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not >>>>>> complete write to file >>>>>> >>>> /tmp/temp1925356068/tmp1003826561/_temporary/_attempt_201005101345_0052_r_000012_4/abs/tmp/temp1925356068/tmp-197182389/part-00012 >>>>>> by DFSClient_attempt_201005101345_0052_r_000012_4 >>>>>> at >>>>>> >>>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:497) >>>>>> at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) >>>>>> at >>>>>> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960) >>>>>> How do I turn on log4j's DEBUG statements? Hoping those will help me >>>>>> pinpoint what is going on here--maybe it's the cluster or maybe the >>>> script. >>>>>> >>>>>> >>>>>> >>>> >>>> >>> >>> >>> -- >>> Zaki Rahaman >> >>
