I have a similar question, but I am using nutch 0.7.1 (non-mapred). Any suggestions are very much appreciated!
I want to have a "Live" directory that contains all the indexes that are ready to be searched. The first index I want to add to the "Live" directory comes from a crawl with 10 rounds of fetching, whose db and segments are stored in the following directories: /crawlA/db/ /crawlA/segments/ I can merge all of the segments in the segments directory (using bin/nutch mergesegs), which results in the following (11th) segment directory: /crawlA/segments/20051219000754/ I can then index the 11th (i.e. merged) segment. However, I have the following questions about which files and directories should be moved to the "Live" directory: 1. If I copy /crawlA/db/ to /Live/db/ and copy /crawlA/segments/20051219000754/ to /Live/segments/20051219000754/ , then I can start tomcat from /Live/ and I'm able to search the index fine. However, if I now have a crawlB directory, I can't copy its db to the "Live" directory because there is already a db directory there. What are the correct files and directories to copy from each crawl into the "Live" directory? 2. Am I even taking the correct approach in merging the 10 segments in the crawlA/segments/ directory before I index, or should I index each segment first and then merge the 10 indexes? If I was to take the latter approach, which files from the /crawlA/ directory would I need to move to the "Live" directory. Thanks ahead of time for any helpful suggestions, Bryan On 11/21/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > Ben Halsted wrote: > > I was wondering what the required file structure is for the web gui to work > > properly. > > > > Are all of these required? > > /db/crawldb > > /db/index > > /db/indexes > > /db/segments > > /db/linkdb > > The indexes directory is not used when a merged index is present. > > The crawldb and segments/*/crawl_parse directories are not used by the > web ui. > > > Also -- What is the proper way to merge segments and indexes? Can I simply > > move segments all into one directory then re-index it, or is there a better > > way? > > You should update the linkdb so that it contains links from all > segments. Then you can use the dedup and merge commands to create a new > index. Ideally you should also re-index after updating the linkdb, but > this is not required. > > Doug > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
