I have a similar question, but I am using nutch 0.7.1 (non-mapred).
Any suggestions are very much appreciated!

I want to have a "Live" directory that contains all the indexes that
are ready to be searched.

The first index I want to add to the "Live" directory comes from a
crawl with 10 rounds of fetching, whose db and segments are stored in
the following directories:

/crawlA/db/
/crawlA/segments/

I can merge all of the segments in the segments directory (using
bin/nutch mergesegs), which results in the following (11th) segment
directory:

/crawlA/segments/20051219000754/

I can then index the 11th (i.e. merged) segment.

However, I have the following questions about which files and
directories should be moved to the "Live" directory:

1. If I copy /crawlA/db/ to /Live/db/  and copy
/crawlA/segments/20051219000754/ to /Live/segments/20051219000754/ ,
then I can start tomcat from /Live/ and I'm able to search the index
fine. However, if I now have a crawlB directory, I can't copy its db
to the "Live" directory because there is already a db directory there.
What are the correct files and directories to copy from each crawl
into the "Live" directory?

2. Am I even taking the correct approach in merging the 10 segments in
the crawlA/segments/ directory before I index, or should I index each
segment first and then merge the 10 indexes? If I was to take the
latter approach, which files from the /crawlA/ directory would I need
to move to the "Live" directory.

Thanks ahead of time for any helpful suggestions,
Bryan


On 11/21/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Ben Halsted wrote:
> > I was wondering what the required file structure is for the web gui to work
> > properly.
> >
> > Are all of these required?
> > /db/crawldb
> > /db/index
> > /db/indexes
> > /db/segments
> > /db/linkdb
>
> The indexes directory is not used when a merged index is present.
>
> The crawldb and segments/*/crawl_parse directories are not used by the
> web ui.
>
> > Also -- What is the proper way to merge segments and indexes? Can I simply
> > move segments all into one directory then re-index it, or is there a better
> > way?
>
> You should update the linkdb so that it contains links from all
> segments.  Then you can use the dedup and merge commands to create a new
> index.  Ideally you should also re-index after updating the linkdb, but
> this is not required.
>
> Doug
>

Reply via email to