Hi I am new to Nutch. But have played around with it. So far really like the tool. I would like to be able to deep crawl a couple of sites, and then also spider crawl sites. So that the end result is a index with a large portion of several sites and more of an organic spider of the rest.
I have tried to do this in several ways, I have used the crawl command and set depth level etc, which work I get a valid index and results. I have also injected the individual URLS of the starting sites into the crawldb and iterated through generate/fetch/update sequence, however in this case it covers the whole web index, but it doesn't seem to add any additional depth on the starting URLS. Which is an issue. When I have tried to merge the Crawl results into the generate/fetch/update results, I get errors. Is there anyway to do this? Also is there anyway to set a priority on certain sites, something like these need to be updated daily and the rest of these weekly? thank you in advance for any help. -John
