I have a similar question, but I am using nutch 0.7.1 (non-mapred).
Any suggestions are very much appreciated!

I want to have a "Live" directory that contains all the indexes that
are ready to be searched.

The first index I want to add to the "Live" directory comes from a
crawl with 10 rounds of fetching, whose db and segments are stored in
the following directories:

/crawlA/db/
/crawlA/segments/

I can merge all of the segments in the segments directory (using
bin/nutch mergesegs), which results in the following (11th) segment
directory:

/crawlA/segments/20051219000754/

I can then index the 11th (i.e. merged) segment.

However, I have the following questions about which files and
directories should be moved to the "Live" directory:

1. If I copy /crawlA/db/ to /Live/db/  and copy
/crawlA/segments/20051219000754/ to /Live/segments/20051219000754/ ,
then I can start tomcat from /Live/ and I'm able to search the index
fine. However, if I now have a crawlB directory, I can't copy its db
to the "Live" directory because there is already a db directory there.
What are the correct files and directories to copy from each crawl
into the "Live" directory?

2. Am I even taking the correct approach in merging the 10 segments in
the crawlA/segments/ directory before I index, or should I index each
segment first and then merge the 10 indexes? If I was to take the
latter approach, which files from the /crawlA/ directory would I need
to move to the "Live" directory.

Thanks ahead of time for any helpful suggestions,
Bryan


On 11/21/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Ben Halsted wrote:
> > I was wondering what the required file structure is for the web gui to work
> > properly.
> >
> > Are all of these required?
> > /db/crawldb
> > /db/index
> > /db/indexes
> > /db/segments
> > /db/linkdb
>
> The indexes directory is not used when a merged index is present.
>
> The crawldb and segments/*/crawl_parse directories are not used by the
> web ui.
>
> > Also -- What is the proper way to merge segments and indexes? Can I simply
> > move segments all into one directory then re-index it, or is there a better
> > way?
>
> You should update the linkdb so that it contains links from all
> segments.  Then you can use the dedup and merge commands to create a new
> index.  Ideally you should also re-index after updating the linkdb, but
> this is not required.
>
> Doug
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to