Re: Using Nutch to Crawl News via RSS

Sourajit Basak Fri, 04 Jan 2013 08:31:22 -0800

See attachment. Its a modification of the bundled FeedParser that should
accomplish your needs.



On Tue, Jan 1, 2013 at 11:49 AM, Rendy Bambang Junior <[email protected]>wrote:

> Thanks for the response
>
> Yes, I wish to crawl urls referenced by rss feed.
>
> In short, I want to have a database which contains news, and I think
> crawling from feed will be a lot more efficient than crawling the whole
> site. Besides, I will clear the database daily to keep only today's news.
>
> Do you think nutch can do this?
>
>
> On Tue, Jan 1, 2013 at 11:55 AM, Sourajit Basak <[email protected]
> >wrote:
>
> > Nutch's default feedparser plugin only indexes the rss feed and does not
> go
> > to the referenced urls. Do you wish to crawl the referenced urls ?
> >
> > On Tue, Jan 1, 2013 at 4:53 AM, Rendy Bambang Junior <[email protected]
> > >wrote:
> >
> > > Hi guys,
> > >
> > > could I use nutch to crawl a feed, then crawl news from that feed? I've
> > > been succeed in crawling the rss feed itself, but what I want is my
> index
> > > contains only news, without the rss feed. Do anybody know how to do
> this
> > > using nutch? Or, it will be better to use another tools to do this?
> > >
> > > Thank you, and happy new year!
> > >
> > > --
> > > Regards,
> > > Rendy Bambang Junior
> > > Informatics Engineering '09
> > > Bandung Institute of Technology
> > >
> >
>
>
>
> --
> Regards,
> Rendy Bambang Junior
> Informatics Engineering '09
> Bandung Institute of Technology
>

Re: Using Nutch to Crawl News via RSS

Reply via email to