Hi,

I have a number of sites that I want to crawl, then merge their segments and 
create a single index. One of the main reasons I want to do this is that I 
want some of the sites in my index to be crawls on a daily basis, others on 
a weekly basis, etc. Each time I re-crawl a site, I want to add the fetched 
URLs to a single aggregate segment/index. I have a couple questions about 
doing this:

1. Is it possible to use a different regex.urlfilter.txt file for each site 
that I am crawling? If so, how would I do this?

2. If I have a very large segment that is indexed (my aggregate index) and I 
want to add another (much smaller) set of fetched URLs to this index, what 
is the best way to do this. It seems like merging the small and large 
segments and then re-indexing the whole thing would be very time consuming 
-- especially if I wanted to add news small sets of fetched URLs frequently. 


Thanks for any suggestions you have to offer,
Bryan

Reply via email to