Hi I am new to Nutch.  But have played around with it.  So far really like
the tool.
I would like to be able to deep crawl a couple of sites, and then also
spider crawl sites.  So that the end result is a index with a large portion
of several sites and more of an organic spider of the rest.

I have tried to do this in several ways, I have used the crawl command and
set depth level etc, which work I get a valid index and results.

I have also injected the individual URLS of the starting sites into the
crawldb and iterated through generate/fetch/update sequence, however in this
case it covers the whole web index, but it doesn't seem to add any
additional depth on the starting URLS.  Which is an issue.

When I have tried to merge the Crawl results into the generate/fetch/update
results, I get errors.

Is there anyway to do this?  Also is there anyway to set a priority on
certain sites, something like these need to be updated daily and the rest of
these weekly?

thank you in advance for any help.

-John

Reply via email to