Re: merging auto-crawls

Ben Halsted Mon, 21 Nov 2005 14:58:20 -0800

Thanks! (sorry about the double post, even a day apart).

One other quick question for you. (Using the mapred branch):


When I merge this stuff, do I need to merge the segments/* for each crawl
into a single segments directory? Or is there data in the merged index file
that will direct the web component to the correct segment?

--Ben

On 11/21/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Ben Halsted wrote:
> > I've modified the auto-crawl to always use a pre-existing crawldb. If I
> run
> > it multiple times I get multiple linkdb, segments, indexes, and index
> > directories.
> >
> > Is it possible to merge the results using the bin/nutch comamnds?
>
> You should also have it use a single linkdb. Then use 'bin/nutch dedup'
> and 'bin/nutch merge' across both indexes directories to create a new
> index with everything.
>
> Doug
>

Re: merging auto-crawls

Reply via email to