Kevvin,
And as for occasionally added set of new urls you can use inject tool
(search nutch archives for "inject tool"). Those newly added (injected) urls
will be then crawled during the next crawl cycle.
Regards,
Lukas
On 12/3/06, Fadzi Ushewokunze <[EMAIL PROTECTED]> wrote:
hi k7,
Add the urls you want to crawl into a folder called /urls.
then in conf/regex-urlfilter.txt add the regular expressions
for the url patterns you want included/excluded.
Hope this answers your question.
Fadzi
On Wed, 2006-11-29 at 15:34 -0800, Kevvin Sevvvin wrote:
> Hi Everybody,
>
> I'm real new to Nutch. I've read through the documentation and many
> months
> of mailinglist archives and I don't think this question has been
> answered.
>
> I have two tasks I would like Nutch to handle. I would like it to
> crawl and
> index ONLY a specific set of urls. This is a stronger limitation that
> confining to specific sites (so db.ignore.external.links is
> insufficient): it
> should not follow ANY links on pages in the list of urls.
>
> Secondly, after creating the crawl and index of specific sites, I
> would like
> to occasionally add SINGLE urls to the index.
>
> Is this possible? If so, is it trivially possible with something like
> '--topN 0'
> (or should that be '--topN 1' ??) ? Or could I create a single local
> web page
> with all the links on it and run the crawler with '-depth 1' ?
>
> Apologies if this is an overasked or misguided question; if so I'd
> appreciate
> pointers to appropriate documentation or code so I can figure it out
> on my own.
>
> Thanks!
> -k7
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general