Re: Merging large sets of segments, help.

Andrzej Bialecki Wed, 24 Jan 2007 10:30:59 -0800

Briggs wrote:

Are you running this in a distributed setup, or in "local" mode? Local
mode is not designed to cope with such large datasets, so it's likely
that you will be getting OOM errors during sorting ... I can only
recommend that you use a distributed setup with several machines, and
adjust RAM consumption with the number of reduce tasks.


Currently we are running in local mode.  We do not have the setup for
distributing. That is why I want to merge these segments.  Would that
not help?  Insteand of having potentially tens of thousands of
segments, I want to create several large segments and index those.

Yes, it makes perfect sense, but you are probably hitting the limits ofa single machine.

I suggest that you should do the merging in several steps: by trial anderror find the maximum number of segments that don't explodeSegmentMerger, and do the first pass merging these small segments intolarger ones; then in the second pass merge these larger ones in thereally large ones.


Sorry for my ignorance, but not really sure how to scale nutch
correctly.  Do you know of a document, or some pointers as to how
segment/index data should be stored?

Most of this information is already available on the Nutch Wiki. All Ican say is that there is certainly a limit to what you can do using the"local" mode - if you need to handle large numbers of pages you willneed to migrate to the distributed setup.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Merging large sets of segments, help.

Reply via email to