> 
> 
> Hi,
> 
> I have got following message from log file while crawling xml url.
> 
> 2008-09-27 16:06:20,920 WARN  parse.ParserFactory - ParserFactory:Plugin:
> org.apache.nutch.parse.rss.RSSParser mapped to contentType text/xml via
> parse-plugins.xml, but its plugin.xml file does not claim to support
> contentType: text/xml
> 
> Please help me if you have any idea.

Possibly a problem with the content type. For rss files I think the content 
type is supposed to be application/rss+xml


> 
> -Chetan
> 
> 
> 
> Chetan Patel wrote:
> > 
> > Hi,
> > 
> > Thanks for help.
> > 
> > I have already added this in plugin.includes.
> > 
> > and still getting only root url.
> > 
> > Regards,
> > Chetan Patel
> > 
> > 
> > Edward Quick wrote:
> >> 
> >> 
> >> Chetan,
> >> 
> >> Try adding parse-rss in nutch-site.xml. Here's mine:
> >> 
> >> <property>
> >>   <name>plugin.includes</name>
> >>  
> >> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> >>   <description></description>
> >> </property>
> >> 
> >> 
> >> Ed.
> >> 
> >> 
> >>> Date: Sat, 27 Sep 2008 01:30:43 -0700
> >>> From: [EMAIL PROTECTED]
> >>> To: [email protected]
> >>> Subject: crawl xml url using nutch-0.9
> >>> 
> >>> 
> >>> Hi All,
> >>> 
> >>> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml)
> >>> using
> >>> depth 2.
> >>> 
> >>> But it will crawl only root url.
> >>> 
> >>> Please help me how to crawl root url as well as all sub url of root url.
> >>> 
> >>> Thanks in advance.
> >>> 
> >>> Regads,
> >>> Chetan Patel
> >>> -- 
> >>> View this message in context:
> >>> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html
> >>> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>> 
> >> 
> >> _________________________________________________________________
> >> Get all your favourite content with the slick new MSN Toolbar - FREE
> >> http://clk.atdmt.com/UKM/go/111354027/direct/01/
> >> 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19701619.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

_________________________________________________________________
Win New York holidays with Kellogg’s & Live Search
http://clk.atdmt.com/UKM/go/111354033/direct/01/

Reply via email to