> > > Hi, > > I have got following message from log file while crawling xml url. > > 2008-09-27 16:06:20,920 WARN parse.ParserFactory - ParserFactory:Plugin: > org.apache.nutch.parse.rss.RSSParser mapped to contentType text/xml via > parse-plugins.xml, but its plugin.xml file does not claim to support > contentType: text/xml > > Please help me if you have any idea.
Possibly a problem with the content type. For rss files I think the content type is supposed to be application/rss+xml > > -Chetan > > > > Chetan Patel wrote: > > > > Hi, > > > > Thanks for help. > > > > I have already added this in plugin.includes. > > > > and still getting only root url. > > > > Regards, > > Chetan Patel > > > > > > Edward Quick wrote: > >> > >> > >> Chetan, > >> > >> Try adding parse-rss in nutch-site.xml. Here's mine: > >> > >> <property> > >> <name>plugin.includes</name> > >> > >> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> > >> <description></description> > >> </property> > >> > >> > >> Ed. > >> > >> > >>> Date: Sat, 27 Sep 2008 01:30:43 -0700 > >>> From: [EMAIL PROTECTED] > >>> To: [email protected] > >>> Subject: crawl xml url using nutch-0.9 > >>> > >>> > >>> Hi All, > >>> > >>> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml) > >>> using > >>> depth 2. > >>> > >>> But it will crawl only root url. > >>> > >>> Please help me how to crawl root url as well as all sub url of root url. > >>> > >>> Thanks in advance. > >>> > >>> Regads, > >>> Chetan Patel > >>> -- > >>> View this message in context: > >>> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html > >>> Sent from the Nutch - User mailing list archive at Nabble.com. > >>> > >> > >> _________________________________________________________________ > >> Get all your favourite content with the slick new MSN Toolbar - FREE > >> http://clk.atdmt.com/UKM/go/111354027/direct/01/ > >> > > > > > > -- > View this message in context: > http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19701619.html > Sent from the Nutch - User mailing list archive at Nabble.com. > _________________________________________________________________ Win New York holidays with Kellogg’s & Live Search http://clk.atdmt.com/UKM/go/111354033/direct/01/
