I looked through some old discussions and realized that in regards to my second question, bin/nutch merge can be used to merge two indexes. What is the advantage of merging indexes vs. searching multiple indexes? How do you search multiple indexes? If you have one index already created and tomcat is already running, then you fetch a new set of URLs and create a new index from them, will nutch automatically search both the indexes?
In regards to the first question I asked in my original e-mail, I would still appreciate any advice anyone has to offer. Thanks, Bryan On 8/23/05, Bryan Woliner <[EMAIL PROTECTED]> wrote: > > Hi, > > I have a number of sites that I want to crawl, then merge their segments > and create a single index. One of the main reasons I want to do this is that > I want some of the sites in my index to be crawls on a daily basis, others > on a weekly basis, etc. Each time I re-crawl a site, I want to add the > fetched URLs to a single aggregate segment/index. I have a couple questions > about doing this: > > 1. Is it possible to use a different regex.urlfilter.txt file for each > site that I am crawling? If so, how would I do this? > > 2. If I have a very large segment that is indexed (my aggregate index) and > I want to add another (much smaller) set of fetched URLs to this index, what > is the best way to do this. It seems like merging the small and large > segments and then re-indexing the whole thing would be very time consuming > -- especially if I wanted to add news small sets of fetched URLs frequently. > > > Thanks for any suggestions you have to offer, > Bryan >
