Hi,
I ran the mapreduce starting with 10 URL into the sixth cycle where it
fetched 400K pages and everything was fine.
060127 001055 TOTAL urls: 1877326
060127 001055 avg score: 1.099
060127 001055 max score: 1666.305
060127 001055 min score: 1.0
060127 001055 retry 0: 1865721
060127 001055 retry 1: 10887
060127 001055 retry 2: 621
060127 001055 retry 3: 92
060127 001055 retry 4: 4
060127 001055 retry 5: 1
060127 001055 status 1 (DB_unfetched): 1477634
060127 001055 status 2 (DB_fetched): 374736
060127 001055 status 3 (DB_gone): 24956
Then I tried another scenario starting with 80K urls, and the first cycle
was OK, but the second cycle where it supposed to fetch 800K failed after
100% reduce.
I ran it with three machines 1 name node and 2 datanodes.
One of my datanode has the next Exception:
060128 083726 task_r_alfaaq Child Error
java.io.IOException: Task process exit with nonzero status.
at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)
at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
Which appeared more than once.
The other Data Node had the next Exception:
060128 142626 Lost connection to JobTracker [server.name/i.i.i.i:50020].
ex=java.lang.reflect.UndeclaredThrowableException Retrying...
Any idea?
Thanks,
Rafit
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/