I just wanted to leave this here since it took me way too long to figure
out. For some people, this might be an obvious problem, but since it wasn't
to me, I want to make sure anyone else that gets this can have this answer.

I kept getting the following error when I was running a crawl. For me, it
was consistently happening, but I couldn't find any similar issues or
solutions on the typical sites. The closest thing I could find was this:
http://www.nosql.se/2011/10/hadoop-tasktracker-java-lang-outofmemoryerror/

Below is the error I was seeing. This is just one of several exceptions
that would happen during the parse but in the end, the parse step would
have too many errors and fail the Nutch error limit.

13/11/20 20:14:19 INFO parse.ParseSegment: ParseSegment: segment:
test/segments/20131120201240
13/11/20 20:14:20 INFO mapred.FileInputFormat: Total input paths to process
: 2
13/11/20 20:14:21 INFO mapred.JobClient: Running job: job_201311202006_0017
13/11/20 20:14:22 INFO mapred.JobClient:  map 0% reduce 0%
13/11/20 20:14:34 INFO mapred.JobClient:  map 40% reduce 0%
13/11/20 20:14:36 INFO mapred.JobClient:  map 50% reduce 0%
13/11/20 20:14:36 INFO mapred.JobClient: Task Id :
attempt_201311202006_0017_m_000001_0, Status : FAILED
java.lang.RuntimeException: Error while running command to get file
permissions : java.io.IOException: Cannot run program "/bin/ls":
java.io.IOException: error=11, Resource temporarily unavailable
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
        at org.apache.hadoop.util.Shell.run(Shell.java:182)
        at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:712)
        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:448)
        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
        at
org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
        at
org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.io.IOException: error=11, Resource
temporarily unavailable
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
        ... 15 more

        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:473)
        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
        at
org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
        at
org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/20 20:14:44 INFO mapred.JobClient:  map 75% reduce 16%
13/11/20 20:14:47 INFO mapred.JobClient:  map 100% reduce 16%
13/11/20 20:14:53 INFO mapred.JobClient:  map 100% reduce 72%
13/11/20 20:14:54 INFO mapred.JobClient:  map 100% reduce 100%
13/11/20 20:14:56 INFO mapred.JobClient: Job complete: job_201311202006_0017

I can't remember exactly what caused me to think it might be related to the
number of available file handles, but that is where I found my solution.

Originally, my system was set for 1024 open files:

$ ulimit -n
1024

Bumping this up to 8096 has fixed my issue at the moment. This is different
depending on the system you run on, so I'm not going to try to cover each
system's solution, but getting this limit increased was critical to getting
my crawl to run through more than one iteration.

Hope this helps anyone with the same problem.

-- 
Jon Uhal

Reply via email to