Hi Everybody,
I'm real new to Nutch. I've read through the documentation and many
months
of mailinglist archives and I don't think this question has been
answered.
I have two tasks I would like Nutch to handle. I would like it to
crawl and
index ONLY a specific set of urls. This is a stronger limitation that
confining to specific sites (so db.ignore.external.links is
insufficient): it
should not follow ANY links on pages in the list of urls.
Secondly, after creating the crawl and index of specific sites, I
would like
to occasionally add SINGLE urls to the index.
Is this possible? If so, is it trivially possible with something like
'--topN 0'
(or should that be '--topN 1' ??) ? Or could I create a single local
web page
with all the links on it and run the crawler with '-depth 1' ?
Apologies if this is an overasked or misguided question; if so I'd
appreciate
pointers to appropriate documentation or code so I can figure it out
on my own.
Thanks!
-k7