After testting: grabbing urls to fetch from unfetch urls takes 15 hours
(ouch!) and fetching 1000 urls only take some minutes (idem for parsing)

I'm guessing one of those phase is taking a very long time:
2013-01-31 13:46:19,387 INFO  crawl.Generator - Generator: Selecting
best-scoring urls due for fetch.
2013-01-31 13:46:19,387 INFO  crawl.Generator - Generator: filtering: true
2013-01-31 13:46:19,387 INFO  crawl.Generator - Generator: normalizing: true

Does someone know how to log each of those steps? Or have any clue about
what happened?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-long-time-just-before-fetching-and-just-after-parsing-tp4037673p4037881.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to