Re: Using Nutch to Crawl News via RSS

Rendy Bambang Junior Fri, 04 Jan 2013 16:45:56 -0800

Wow thanks a lot! It'll be so helping for me to understand nutch way.
Thanks!
Sorry but I don't see the attachment, do you forget to attach it, or gmail
doesn't show it?


Again, thanks!


On Fri, Jan 4, 2013 at 11:30 PM, Sourajit Basak <[email protected]>wrote:

> See attachment. Its a modification of the bundled FeedParser that should
> accomplish your needs.
>
>
> On Tue, Jan 1, 2013 at 11:49 AM, Rendy Bambang Junior 
> <[email protected]>wrote:
>
>> Thanks for the response
>>
>> Yes, I wish to crawl urls referenced by rss feed.
>>
>> In short, I want to have a database which contains news, and I think
>> crawling from feed will be a lot more efficient than crawling the whole
>> site. Besides, I will clear the database daily to keep only today's news.
>>
>> Do you think nutch can do this?
>>
>>
>> On Tue, Jan 1, 2013 at 11:55 AM, Sourajit Basak <[email protected]
>> >wrote:
>>
>> > Nutch's default feedparser plugin only indexes the rss feed and does
>> not go
>> > to the referenced urls. Do you wish to crawl the referenced urls ?
>> >
>> > On Tue, Jan 1, 2013 at 4:53 AM, Rendy Bambang Junior <
>> [email protected]
>> > >wrote:
>> >
>> > > Hi guys,
>> > >
>> > > could I use nutch to crawl a feed, then crawl news from that feed?
>> I've
>> > > been succeed in crawling the rss feed itself, but what I want is my
>> index
>> > > contains only news, without the rss feed. Do anybody know how to do
>> this
>> > > using nutch? Or, it will be better to use another tools to do this?
>> > >
>> > > Thank you, and happy new year!
>> > >
>> > > --
>> > > Regards,
>> > > Rendy Bambang Junior
>> > > Informatics Engineering '09
>> > > Bandung Institute of Technology
>> > >
>> >
>>
>>
>>
>> --
>> Regards,
>> Rendy Bambang Junior
>> Informatics Engineering '09
>> Bandung Institute of Technology
>>
>
>


-- 
Regards,
Rendy Bambang Junior
Informatics Engineering '09
Bandung Institute of Technology

Re: Using Nutch to Crawl News via RSS

Reply via email to