I am not sure everything that may be causing this, especially because the stack 
trace is cut off. Your file lease has expired on the output file.  Typically 
the client is supposed to keep the file lease up to date, so if RPC had a very 
long hiccup in it you may be getting this problem.  It could also be somehow 
related to the OutputCommitter in another task deleting the file out from under 
the task.

--Bobby

From: David Parks <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, February 11, 2013 12:02 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: File does not exist on part-r-00000 file after reducer runs

Are there any rules against writing results to Reducer.Context while in the 
cleanup() method?

I’ve got a reducer that is downloading a few 10’s of millions of images from a 
set of URLs feed to it.

To be efficient I run many connections in parallel, but limit connections per 
domain and frequency of connections.

In order to do that efficiently I read in many URLs from the reduce method and 
queue them in a processing queue, so at some point we read in all the data and 
Hadoop calls the cleanup()  method where I block until all threads have 
finished processing.

We may continue processing and writing results (in a synchronized manner) for 
20 or 30 minutes after Hadoop reports 100% input records delivered, then at the 
end, my code appears to exit normally and I get this exception immediately 
after:

2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor 
Main Loop): Processing complete, shut down normally                          1
2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): 
Initializing logsÊ1Ž4 truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): 
Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got 
UserName hadoop for UID 106 from the native implementation
2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation 
(main): PriviledgedActionException as:hadoop 
cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop
.hdfs.server.namenode.LeaseExpiredException: No lease on 
/frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/part-r-00002
 File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 
does not have any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1642)

I have suspicion that there are some subtle rules of Hadoop’s I’m violating 
here.

Reply via email to