I'm indexing 5M pages on a small/cheap cluster. there are some fatal errors
I try to understand.
1. no disk space error occurs on slave node even though there are still 30%
free space (>20GB) in hdfs partition. is it possible that disk requirement
may surge during nutch indexing?
2010-08-19 02:34:23,546 INFO mapred.ReduceTask -
attempt_201008141418_0034_r_000004_2 Scheduled 1 outputs (1 slow hosts and0
dup hosts)
2010-08-19 02:34:24,191 ERROR mapred.ReduceTask - Task:
attempt_201008141418_0034_r_000004_2 - FSError:
org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:192)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:104)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleToDisk(ReduceTask.java:1620)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1416)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
... 8 more
2. task failure errors. any idea what might cause it?
Task attempt_201008141418_0034_r_000005_0 failed to report status for 600
seconds. Killing!
Task attempt_201008141418_0034_r_000001_0 failed to report status for 600
seconds. Killing!
Task attempt_201008141418_0034_r_000002_0 failed to report status for 601
seconds. Killing!
2010-08-19 00:49:28,309 INFO mapred.TaskTracker - Process Thread Dump: lost
task
28 active threads
Thread 12334 (IPC Client (47) connection to
vmo-crawl08-dev/10.1.1.60:9001from jboss):
State: TIMED_WAITING
Blocked count: 2498
Waited count: 2498
Stack:
java.lang.Object.wait(Native Method)
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
Thread 11256 (process reaper):
State: RUNNABLE
Blocked count: 0
Waited count: 0
Stack:
java.lang.UNIXProcess.waitForProcessExit(Native Method)
java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
thanks,
aj