Hi,

I ran the mapreduce starting with 10 URL into the sixth cycle where it fetched 400K pages and everything was fine.

060127 001055 TOTAL urls:       1877326
060127 001055 avg score:        1.099
060127 001055 max score:        1666.305
060127 001055 min score:        1.0
060127 001055 retry 0:  1865721
060127 001055 retry 1:  10887
060127 001055 retry 2:  621
060127 001055 retry 3:  92
060127 001055 retry 4:  4
060127 001055 retry 5:  1
060127 001055 status 1 (DB_unfetched):  1477634
060127 001055 status 2 (DB_fetched):    374736
060127 001055 status 3 (DB_gone):       24956

Then I tried another scenario starting with 80K urls, and the first cycle was OK, but the second cycle where it supposed to fetch 800K failed after 100% reduce.
I ran it with three machines 1 name node and 2 datanodes.

One of my datanode has the next Exception:
060128 083726 task_r_alfaaq Child Error
java.io.IOException: Task process exit with nonzero status.
       at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)
       at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
Which appeared more than once.

The other Data Node had the next Exception:
060128 142626 Lost connection to JobTracker [server.name/i.i.i.i:50020]. ex=java.lang.reflect.UndeclaredThrowableException Retrying...

Any idea?

Thanks,
Rafit

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Reply via email to