Hi, I did a fetch with 2000 Urls on the Url seed list with 1000 threads and 100 urls from one single host. The fetching process was super quick (60-70 Urls/s), but in between there are always INFO messages from mapred.LocalJobRunner. During the parsing process the number of those messages increased. I see something like 009-01-28 17:02:21,886 INFO mapred.LocalJobRunner - file:/.../segments/20090128132015/content/part-00000/data:67108864+33554432 2009-01-28 17:02:24,890 INFO mapred.LocalJobRunner - file:/.../segments/20090128132015/content/part-00000/data:67108864+33554432 2009-01-28 17:02:27,894 INFO mapred.LocalJobRunner - file:/.../segments/20090128132015/content/part-00000/data:67108864+33554432 2009-01-28 17:02:30,898 INFO mapred.LocalJobRunner - file:/.../segments/20090128132015/content/part-00000/data:67108864+33554432 2009-01-28 17:02:33,902 INFO mapred.LocalJobRunner - file:/.../segments/20090128132015/content/part-00000/data:67108864+33554432 2009-01-28 17:02:33,998 INFO mapred.JobClient - map 74% reduce 0% 2009-01-28 17:02:36,906 INFO mapred.LocalJobRunner - file:/.../segments/20090128132015/content/part-00000/data:67108864+33554432 2009-01-28 17:02:39,910 INFO mapred.LocalJobRunner - file:/.../segments/20090128132015/content/part-00000/data:67108864+33554432 2009-01-28 17:02:42,914 INFO mapred.LocalJobRunner - file:/... These messages appear since an hour and then from time to time a messages that another url has been parsed. But I would say that 90% of all output comes from loacaljobrunner. I think these mapred processes slow down my complete generate/fetch/parse cycle. What can I do? Is this a normal behavior? Any ideas? What did I wrong? We are running two Nutch instances on a single machine. Thanks in advance, Nadine.
