Hi, My test setup (only local) now has just over 20 million URL's, i fetched 3m already and the rest needs to be fetched. It's now less time wasting to fetch for 12 hours because merging takes now over 5.5 hours!
I've searched but found little information so far. Would now be a good time to try running Nutch on a Hadoop cluster (which i don't have) or try to let Hadoop take advantage of my multiple cores? Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

