Re: Using Nutch to Crawl News via RSS

Rendy Bambang Junior Mon, 31 Dec 2012 22:20:24 -0800

Thanks for the response

Yes, I wish to crawl urls referenced by rss feed.


In short, I want to have a database which contains news, and I think
crawling from feed will be a lot more efficient than crawling the whole
site. Besides, I will clear the database daily to keep only today's news.

Do you think nutch can do this?


On Tue, Jan 1, 2013 at 11:55 AM, Sourajit Basak <[email protected]>wrote:

> Nutch's default feedparser plugin only indexes the rss feed and does not go
> to the referenced urls. Do you wish to crawl the referenced urls ?
>
> On Tue, Jan 1, 2013 at 4:53 AM, Rendy Bambang Junior <[email protected]
> >wrote:
>
> > Hi guys,
> >
> > could I use nutch to crawl a feed, then crawl news from that feed? I've
> > been succeed in crawling the rss feed itself, but what I want is my index
> > contains only news, without the rss feed. Do anybody know how to do this
> > using nutch? Or, it will be better to use another tools to do this?
> >
> > Thank you, and happy new year!
> >
> > --
> > Regards,
> > Rendy Bambang Junior
> > Informatics Engineering '09
> > Bandung Institute of Technology
> >
>



-- 
Regards,
Rendy Bambang Junior
Informatics Engineering '09
Bandung Institute of Technology

Re: Using Nutch to Crawl News via RSS

Reply via email to