Thanks RemyA. My ignore outlinks was set to true and most of the urls on
the feed were outlinks. that was the problem.

Wen I set it to false, it worked. also, i had to remove ? and = from regex
url filter. to allows feed urls like ?format=rss

Thanks for ur time.

On Thu, Jun 7, 2012 at 1:10 AM, Rémy Amouroux <[email protected]> wrote:

> first problem coming to mind : is your regexp-urlfilter accepting those
> urls ?
>
> You should also do a readseg on the crawled segment to see of those urls
> are listed in the outlinks of the feeds.
>
> Regards
>
> RemyA
>
> Le 6 juin 2012 à 19:14, Shameema Umer a écrit :
>
> > I have added the feed plugin to the nutch-site.xml
> >
> > and provided some feed urls on the seed.txt.
> >
> > but nutch is not crawling those urls found on the feed file. Please help.
>
>

Reply via email to