Hi,

My test setup (only local) now has just over 20 million URL's, i fetched 3m 
already and the rest needs to be fetched. It's now less time wasting to fetch 
for 12 hours because merging takes now over 5.5 hours!

I've searched but found little information so far. Would now be a good time to 
try running Nutch on a Hadoop cluster (which i don't have) or try to let 
Hadoop take advantage of my multiple cores?

Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to