Re: Limiting crawl to specific list of URLS

Nitin Borwankar Wed, 29 Nov 2006 15:39:47 -0800

Kevvin Sevvvin wrote:

Hi Everybody,
I'm real new to Nutch. I've read through the documentation and manymonthsof mailinglist archives and I don't think this question has beenanswered.
I have two tasks I would like Nutch to handle. I would like it tocrawl and
index ONLY a specific set of urls. This is a stronger limitation that
confining to specific sites (so db.ignore.external.links isinsufficient): it
should not follow ANY links on pages in the list of urls.
Secondly, after creating the crawl and index of specific sites, Iwould like
to occasionally add SINGLE urls to the index.
Is this possible? If so, is it trivially possible with something like'--topN 0'(or should that be '--topN 1' ??) ? Or could I create a single localweb page
with all the links on it and run the crawler with '-depth 1' ?
Apologies if this is an overasked or misguided question; if so I'dappreciatepointers to appropriate documentation or code so I can figure it outon my own.
Thanks!
-k7


Hi Kevin,

I am a relative newbie to Nutch as well.
I believe you are looking for --depth=1 which will not follow the URL's
I have used --depth ( but not with value 1 ) on the 0.7.2 version.


Nitin Borwankar
http://tagschema.com

Re: Limiting crawl to specific list of URLS

Reply via email to