Wow thanks a lot! It'll be so helping for me to understand nutch way. Thanks! Sorry but I don't see the attachment, do you forget to attach it, or gmail doesn't show it?
Again, thanks! On Fri, Jan 4, 2013 at 11:30 PM, Sourajit Basak <[email protected]>wrote: > See attachment. Its a modification of the bundled FeedParser that should > accomplish your needs. > > > On Tue, Jan 1, 2013 at 11:49 AM, Rendy Bambang Junior > <[email protected]>wrote: > >> Thanks for the response >> >> Yes, I wish to crawl urls referenced by rss feed. >> >> In short, I want to have a database which contains news, and I think >> crawling from feed will be a lot more efficient than crawling the whole >> site. Besides, I will clear the database daily to keep only today's news. >> >> Do you think nutch can do this? >> >> >> On Tue, Jan 1, 2013 at 11:55 AM, Sourajit Basak <[email protected] >> >wrote: >> >> > Nutch's default feedparser plugin only indexes the rss feed and does >> not go >> > to the referenced urls. Do you wish to crawl the referenced urls ? >> > >> > On Tue, Jan 1, 2013 at 4:53 AM, Rendy Bambang Junior < >> [email protected] >> > >wrote: >> > >> > > Hi guys, >> > > >> > > could I use nutch to crawl a feed, then crawl news from that feed? >> I've >> > > been succeed in crawling the rss feed itself, but what I want is my >> index >> > > contains only news, without the rss feed. Do anybody know how to do >> this >> > > using nutch? Or, it will be better to use another tools to do this? >> > > >> > > Thank you, and happy new year! >> > > >> > > -- >> > > Regards, >> > > Rendy Bambang Junior >> > > Informatics Engineering '09 >> > > Bandung Institute of Technology >> > > >> > >> >> >> >> -- >> Regards, >> Rendy Bambang Junior >> Informatics Engineering '09 >> Bandung Institute of Technology >> > > -- Regards, Rendy Bambang Junior Informatics Engineering '09 Bandung Institute of Technology

