See attachment. Its a modification of the bundled FeedParser that should accomplish your needs.
On Tue, Jan 1, 2013 at 11:49 AM, Rendy Bambang Junior <[email protected]>wrote: > Thanks for the response > > Yes, I wish to crawl urls referenced by rss feed. > > In short, I want to have a database which contains news, and I think > crawling from feed will be a lot more efficient than crawling the whole > site. Besides, I will clear the database daily to keep only today's news. > > Do you think nutch can do this? > > > On Tue, Jan 1, 2013 at 11:55 AM, Sourajit Basak <[email protected] > >wrote: > > > Nutch's default feedparser plugin only indexes the rss feed and does not > go > > to the referenced urls. Do you wish to crawl the referenced urls ? > > > > On Tue, Jan 1, 2013 at 4:53 AM, Rendy Bambang Junior <[email protected] > > >wrote: > > > > > Hi guys, > > > > > > could I use nutch to crawl a feed, then crawl news from that feed? I've > > > been succeed in crawling the rss feed itself, but what I want is my > index > > > contains only news, without the rss feed. Do anybody know how to do > this > > > using nutch? Or, it will be better to use another tools to do this? > > > > > > Thank you, and happy new year! > > > > > > -- > > > Regards, > > > Rendy Bambang Junior > > > Informatics Engineering '09 > > > Bandung Institute of Technology > > > > > > > > > -- > Regards, > Rendy Bambang Junior > Informatics Engineering '09 > Bandung Institute of Technology >

