Kevvin Sevvvin wrote:


Hi Everybody,

I'm real new to Nutch. I've read through the documentation and many months of mailinglist archives and I don't think this question has been answered.

I have two tasks I would like Nutch to handle. I would like it to crawl and
index ONLY a specific set of urls. This is a stronger limitation that
confining to specific sites (so db.ignore.external.links is insufficient): it
should not follow ANY links on pages in the list of urls.

Secondly, after creating the crawl and index of specific sites, I would like
to occasionally add SINGLE urls to the index.

Is this possible? If so, is it trivially possible with something like '--topN 0' (or should that be '--topN 1' ??) ? Or could I create a single local web page
with all the links on it and run the crawler with '-depth 1' ?

Apologies if this is an overasked or misguided question; if so I'd appreciate pointers to appropriate documentation or code so I can figure it out on my own.

Thanks!
-k7

Hi Kevin,

I am a relative newbie to Nutch as well.
I believe you are looking for --depth=1 which will not follow the URL's
I have used --depth ( but not with value 1 ) on the 0.7.2 version.


Nitin Borwankar
http://tagschema.com



Reply via email to