Chetan,

Try adding parse-rss in nutch-site.xml. Here's mine:

<property>
  <name>plugin.includes</name>
  
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description></description>
</property>


Ed.


> Date: Sat, 27 Sep 2008 01:30:43 -0700
> From: [EMAIL PROTECTED]
> To: [email protected]
> Subject: crawl xml url using nutch-0.9
> 
> 
> Hi All,
> 
> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml) using
> depth 2.
> 
> But it will crawl only root url.
> 
> Please help me how to crawl root url as well as all sub url of root url.
> 
> Thanks in advance.
> 
> Regads,
> Chetan Patel
> -- 
> View this message in context: 
> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

_________________________________________________________________
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/

Reply via email to