Re: Unable to complete a full fetch, reason Child Error

2006-03-03 Thread Mike Smith
Hi Doug

I did some more testings using the last svn. Childs still die without any
clear log after a while.

I used two machines through Hadoop, both are datanode and tasktracker and
one is namenode and jobtracker. I started with 2000 seed nodes and it went
fine till 4th cycle, reached about 600,000 pages and the next round was for
3,000,000 pages to fetch. It failed again with this exception in the middle
of fetching:

060302 232934 task_m_7lbv7e  fetching
http://www.findarticles.com/p/articles/mi_m0KJI/is_9_115/ai_107836357
060302 232934 task_m_7lbv7e  fetching
http://www.wholehealthmd.com/hc/resourceareas_supp/1,1442,544,00.html
060302 232934 task_m_7lbv7e  fetching
http://www.dow.com/haltermann/products/d-petro.htm
060302 232934 task_m_7lbv7e 0.7877368% 700644 pages, 24594 errors,
14.0pages/s, 2254 kb/s,
060302 232934 task_m_7lbv7e  fetching
http://www.findarticles.com/p/articles/mi_hb3594/is_199510/ai_n8541042
060302 232934 task_m_7lbv7e Error reading child output
java.io.IOException: Bad file descriptor
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:194)
at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java
:411)
at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java
:453)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at org.apache.hadoop.mapred.TaskRunner.logStream(TaskRunner.java
:299)
at org.apache.hadoop.mapred.TaskRunner.access$100(TaskRunner.java
:32)
at org.apache.hadoop.mapred.TaskRunner$1.run(TaskRunner.java:266)
060302 232934 task_m_7lbv7e 0.7877451% 700644 pages, 24594 errors,
14.0pages/s, 2254 kb/s,
060302 232934 task_m_7lbv7e 0.7877451% 700644 pages, 24594 errors,
14.0pages/s, 2254 kb/s,
060302 232934 Server connection on port 50050 from 164.67.195.27: exiting
060302 232934 Server connection on port 50050 from 164.67.195.27: exiting
060302 232934 task_m_7lbv7e Child Error
java.io.IOException: Task process exit with nonzero status.
at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:273)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:145)
060302 232937 task_m_7lbv7e done; removing files.


And this is console output:



060303 010945  map 86%  reduce 0%
060303 012033  map 86%  reduce 6%
060303 012223  map 87%  reduce 6%
060303 014623  map 88%  reduce 6%
060303 021304  map 89%  reduce 6%
060303 022921  map 50%  reduce 0%
060303 022921 SEVERE error, caught Exception in main()
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:366)
at org.apache.nutch.fetcher.Fetcher.doMain(Fetcher.java:400)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:411)


This error has been around for large scale crawl since couple months ago. I
was wondering if anybody else has had the same issue for large scale crawl.

Thanks, Mike.






On 2/26/06, Gal Nitzan [EMAIL PROTECTED] wrote:

 Still got the same...

 I'm not sure if it is relevant to this issue but the call you added to
 Fetcher.java:

 job.setBoolean(mapred.speculative.execution, false);

 Doesn't work. All task trackers still fetch together though I have only
 3 sites in the fetchlist.

 The task trackers fetch the same pages...

 I have used latest build from hadoop trunk.

 Gal.


 On Fri, 2006-02-24 at 14:15 -0800, Doug Cutting wrote:
  Mike Smith wrote:
   060219 142408 task_m_grycae  Parent died.  Exiting task_m_grycae
 
  This means the child process, executing the task, was unable to ping its
  parent process (the task tracker).
 
   060219 142408 task_m_grycae Child Error
   java.io.IOException: Task process exit with nonzero status.
   at org.apache.hadoop.mapred.TaskRunner.runChild(
 TaskRunner.java:144)
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)
 
  And this means that the parent was really still alive, and has noticed
  that the child killed itself.
 
  It would be good to know how the child failed to contact its parent.  We
  should probably log a stack trace when this happens.  I just made that
  change in Hadoop and will propagate it to Nutch.
 
  Doug
 





Re: Unable to complete a full fetch, reason Child Error

2006-02-26 Thread Gal Nitzan
Still got the same...

I'm not sure if it is relevant to this issue but the call you added to
Fetcher.java: 

 job.setBoolean(mapred.speculative.execution, false);

Doesn't work. All task trackers still fetch together though I have only
3 sites in the fetchlist.

The task trackers fetch the same pages...

I have used latest build from hadoop trunk.

Gal.


On Fri, 2006-02-24 at 14:15 -0800, Doug Cutting wrote:
 Mike Smith wrote:
  060219 142408 task_m_grycae  Parent died.  Exiting task_m_grycae
 
 This means the child process, executing the task, was unable to ping its 
 parent process (the task tracker).
 
  060219 142408 task_m_grycae Child Error
  java.io.IOException: Task process exit with nonzero status.
  at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144)
  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)
 
 And this means that the parent was really still alive, and has noticed 
 that the child killed itself.
 
 It would be good to know how the child failed to contact its parent.  We 
 should probably log a stack trace when this happens.  I just made that 
 change in Hadoop and will propagate it to Nutch.
 
 Doug
 




Re: Unable to complete a full fetch, reason Child Error

2006-02-24 Thread Doug Cutting

Mike Smith wrote:

060219 142408 task_m_grycae  Parent died.  Exiting task_m_grycae


This means the child process, executing the task, was unable to ping its 
parent process (the task tracker).



060219 142408 task_m_grycae Child Error
java.io.IOException: Task process exit with nonzero status.
at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)


And this means that the parent was really still alive, and has noticed 
that the child killed itself.


It would be good to know how the child failed to contact its parent.  We 
should probably log a stack trace when this happens.  I just made that 
change in Hadoop and will propagate it to Nutch.


Doug


Unable to complete a full fetch, reason Child Error

2006-02-16 Thread Gal Nitzan
During fetch all tasktrackers aborting the fetch with:

task_m_b45ma2 Child Error
java.io.IOException: Task process exit with nonzero status.
at
org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:144)
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:97)