I am using nutch 0.7.1 (non-mapred) and am a little confused about how to move the contents of several "test" crawls into a single "live" directory. Any suggestions are very much appreciated!
I want to have a "Live" directory that contains all the indexes that are ready to be searched. The first index I want to add to the "Live" directory comes from a crawl with 10 rounds of fetching, whose db and segments are stored in the following directories: /crawlA/db/ /crawlA/segments/ I can merge all of the segments in the segments directory (using bin/nutch mergesegs), which results in the following (11th) segment directory: /crawlA/segments/20051219000754/ I can then index this 11th (i.e. merged) segment. However, I have the following questions about which files and directories should be moved to the "Live" directory: 1. If I copy /crawlA/db/ to /Live/db/ and copy /crawlA/segments/20051219000754/ to /Live/segments/20051219000754/ , then I can start tomcat from /Live/ and I'm able to search the index fine. However, I'm note sure if that can be duplicated for my crawlB directory. I can't copy /crawlB/db/ to the "Live" directory because there is already a db directory there. What are the correct files and directories to copy from each crawl into the "Live" directory? 2. On a side note: am I even taking the correct approach in merging the 10 segments in the crawlA/segments/ directory before I index, or should I index each segment first and then merge the 10 indexes? If I was to take the latter approach (merging indexes instead of segments), which files from the /crawlA/ directory would I need to move to the "Live" directory. Thanks ahead of time for any helpful suggestions,
