> Are you running this in a distributed setup, or in "local" mode? Local
> mode is not designed to cope with such large datasets, so it's likely
> that you will be getting OOM errors during sorting ... I can only
> recommend that you use a distributed setup with several machines, and
> adjust RAM consumption with the number of reduce tasks.

Currently we are running in local mode.  We do not have the setup for
distributing. That is why I want to merge these segments.  Would that
not help?  Insteand of having potentially tens of thousands of
segments, I want to create several large segments and index those.

Sorry for my ignorance, but not really sure how to scale nutch
correctly.  Do you know of a document, or some pointers as to how
segment/index data should be stored?

<briggs />

"Concious decisions by concious minds are what make reality real"

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to