Thanks for the response Yes, I wish to crawl urls referenced by rss feed.
In short, I want to have a database which contains news, and I think crawling from feed will be a lot more efficient than crawling the whole site. Besides, I will clear the database daily to keep only today's news. Do you think nutch can do this? On Tue, Jan 1, 2013 at 11:55 AM, Sourajit Basak <[email protected]>wrote: > Nutch's default feedparser plugin only indexes the rss feed and does not go > to the referenced urls. Do you wish to crawl the referenced urls ? > > On Tue, Jan 1, 2013 at 4:53 AM, Rendy Bambang Junior <[email protected] > >wrote: > > > Hi guys, > > > > could I use nutch to crawl a feed, then crawl news from that feed? I've > > been succeed in crawling the rss feed itself, but what I want is my index > > contains only news, without the rss feed. Do anybody know how to do this > > using nutch? Or, it will be better to use another tools to do this? > > > > Thank you, and happy new year! > > > > -- > > Regards, > > Rendy Bambang Junior > > Informatics Engineering '09 > > Bandung Institute of Technology > > > -- Regards, Rendy Bambang Junior Informatics Engineering '09 Bandung Institute of Technology

