Re: CrawlDB, very slow

Andrzej Bialecki Tue, 28 Sep 2010 05:20:43 -0700

On 2010-09-28 14:02, Markus Jelsma wrote:

Hi,


My test setup (only local) now has just over 20 million URL's, i fetched 3m
already and the rest needs to be fetched. It's now less time wasting to fetch
for 12 hours because merging takes now over 5.5 hours!

I've searched but found little information so far. Would now be a good time to
try running Nutch on a Hadoop cluster (which i don't have) or try to let
Hadoop take advantage of my multiple cores?

Even running Hadoop in pseudo-distributed mode (on a single node butwith real JobTracker/TaskTracker) would be much better. The reason isthat in local mode tasks are NOT executed in parallel but serially.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: CrawlDB, very slow

Reply via email to