[Nutch-general] Best performance approach for single MP machine?

Doug Cook Wed, 19 Jul 2006 23:35:20 -0700

Hi,

I've recently switched to 0.8 from 0.7, and after some initial fits and
starts, I'm past the "get it working at all" stage to the "get reasonable
performance" stage.

I've got a single machine with 4 CPUs and a lot of memory. URL fetching
works great because it's (mostly) multithreaded. But as soon as I hit the
reduce phase of fetch, it's dog slow. I'm down to running on one CPU, and
the phase can take days, leaving me vulnerable to losing everything should a
process fail.

Wait! you say. That's just what Hadoop is for! I'm all ears. I'd love some
help getting my configuration right. I've seen examples/tutorials of
configurations for multiple machines; am I just "faking" multiple machines
on my single node (will that work?) or is there a cleaner, simpler approach?

Alternatively, I was all excited to get an easy improvement with
-numFetchers, and run 4 fetchers simultaneously to use all my CPUs, but it
looks like -numFetchers has gone away, and though there was an 0.8 version
patch, at a quick glance this didn't seem to have made it into the mainline
source, and I don't see the value of trying to merge this in if there's a
cleaner Hadoop-based approach.

Many thanks for any help.

Doug
--
View this message in context:
http://www.nabble.com/Best-performance-approach-for-single-MP-machine--tf1970539.html#a5409596
Sent from the Nutch - User forum at Nabble.com.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Best performance approach for single MP machine?

Reply via email to