Hi,

I have got following message from log file while crawling xml url.

2008-09-27 16:06:20,920 WARN  parse.ParserFactory - ParserFactory:Plugin:
org.apache.nutch.parse.rss.RSSParser mapped to contentType text/xml via
parse-plugins.xml, but its plugin.xml file does not claim to support
contentType: text/xml

Please help me if you have any idea.

-Chetan



Chetan Patel wrote:
> 
> Hi,
> 
> Thanks for help.
> 
> I have already added this in plugin.includes.
> 
> and still getting only root url.
> 
> Regards,
> Chetan Patel
> 
> 
> Edward Quick wrote:
>> 
>> 
>> Chetan,
>> 
>> Try adding parse-rss in nutch-site.xml. Here's mine:
>> 
>> <property>
>>   <name>plugin.includes</name>
>>  
>> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>>   <description></description>
>> </property>
>> 
>> 
>> Ed.
>> 
>> 
>>> Date: Sat, 27 Sep 2008 01:30:43 -0700
>>> From: [EMAIL PROTECTED]
>>> To: [email protected]
>>> Subject: crawl xml url using nutch-0.9
>>> 
>>> 
>>> Hi All,
>>> 
>>> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml)
>>> using
>>> depth 2.
>>> 
>>> But it will crawl only root url.
>>> 
>>> Please help me how to crawl root url as well as all sub url of root url.
>>> 
>>> Thanks in advance.
>>> 
>>> Regads,
>>> Chetan Patel
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>> 
>> 
>> _________________________________________________________________
>> Get all your favourite content with the slick new MSN Toolbar - FREE
>> http://clk.atdmt.com/UKM/go/111354027/direct/01/
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19701619.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to