I am using nutch 0.7.1 (non-mapred) and am a little confused about how to
move the contents of several "test" crawls into a single "live" directory.
Any suggestions are very much appreciated!

I want to have a "Live" directory that contains all the indexes that
are ready to be searched.

The first index I want to add to the "Live" directory comes from a
crawl with 10 rounds of fetching, whose db and segments are stored in
the following directories:

/crawlA/db/
/crawlA/segments/

I can merge all of the segments in the segments directory (using
bin/nutch mergesegs), which results in the following (11th) segment
directory:

/crawlA/segments/20051219000754/

I can then index this 11th (i.e. merged) segment.

However, I have the following questions about which files and
directories should be moved to the "Live" directory:

1. If I copy /crawlA/db/ to /Live/db/  and copy
/crawlA/segments/20051219000754/ to /Live/segments/20051219000754/ ,
then I can start tomcat from /Live/ and I'm able to search the index
fine. However, I'm note sure if that can be duplicated for my crawlB
directory. I can't copy /crawlB/db/
to the "Live" directory because there is already a db directory there.
What are the correct files and directories to copy from each crawl
into the "Live" directory?

2. On a side note: am I even taking the correct approach in merging the 10
segments in
the crawlA/segments/ directory before I index, or should I index each
segment first and then merge the 10 indexes? If I was to take the
latter approach (merging indexes instead of segments), which files from the
/crawlA/ directory would I need
to move to the "Live" directory.

Thanks ahead of time for any helpful suggestions,

Reply via email to