I have been getting this exception during fetching for almost a month. This exception stops the whole crawl. It happens on and off! Any Idea?? We are really stocked with this problem.
I am using 3 data node and 1 name server. 060223 173809 task_m_b8ibww fetching http://www.heartcenter.com/94fall.pdf 060223 173809 task_m_b8ibww fetching http://www.medinfo.co.uk/conditions/tenosynovitis.html 060223 173809 task_m_b8ibww fetching http://www.boncholesterol.com/whatsnew/index.shtml 060223 173809 task_m_b8ibww fetching http://www.drcranton.com/hrt/promise_of_longevity.htm 060223 173809 task_m_b8ibww fetching http://www.drcranton.com/hrt/promise_of_longevity.htm 060223 173809 task_m_b8ibww Error reading child output java.io.IOException: Bad file descriptor at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:194) at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java :411) at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java :453) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183) at java.io.InputStreamReader.read(InputStreamReader.java:167) at java.io.BufferedReader.fill(BufferedReader.java:136) at java.io.BufferedReader.readLine(BufferedReader.java:299) at java.io.BufferedReader.readLine(BufferedReader.java:362) at org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java :170) at org.apache.hadoop.mapred.TaskRunner.access$100(TaskRunner.java :29) at org.apache.hadoop.mapred.TaskRunner$1.run(TaskRunner.java:137) 060223 173809 task_r_3h1pex 0.16666667% reduce > copy > 060223 173809 Server connection on port 50050 from xxxxxx: exiting 060223 173809 Server connection on port 50050 from xxxxxx: exiting 060223 173809 task_m_b8ibww Child Error java.io.IOException: Task process exit with nonzero status. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97) 060223 173812 task_m_b8ibww done; removing files.
