On Fri, Jul 10, 2009 at 13:21, Beats<tarun_agrawal...@yahoo.com> wrote:
>
> hi,
>
> thanx for the help
>
> but it is giving parsing error. is there some other changes to b made???
>
>
> the error is
> fetcher.Fetcher (Fetcher.java:output(796)) - Error parsing:
> http://www.indeed.co.in/rss: failed(2,0)
>

http://www.indeed.co.in/robots.txt

/rss is Disallow-ed. So nutch doesn't crawl it.

>
> Doğacan Güney-3 wrote:
>>
>> On Fri, Jul 10, 2009 at 10:01, Beats<tarun_agrawal...@yahoo.com> wrote:
>>>
>>> hi,
>>>
>>> i m new to nutch.
>>> i m trying to crawl and index the rss feed using feed plugin.
>>>
>>> what i want is to parse the rss page and index each item's content
>>> seperately.
>>> so that when the user search the content , the content in the item is
>>> searched and displayed...(not the whole rss feed page content).
>>>
>>
>> Try using the feed plugin. It extracts each item in rss as a different
>> page.
>>
>>> any suggestion would b appriciated..
>>>
>>>
>>> thanx in advance
>>>
>>> Beats
>>> --
>>> View this message in context:
>>> http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24422674.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Doğacan Güney
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/indexing-each-item-in-seperate-page-tp24422674p24424901.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney

Reply via email to