Hi,

Thanks for help.

I have already added this in plugin.includes.

and still getting only root url.

Regards,
Chetan Patel


Edward Quick wrote:
> 
> 
> Chetan,
> 
> Try adding parse-rss in nutch-site.xml. Here's mine:
> 
> <property>
>   <name>plugin.includes</name>
>  
> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>   <description></description>
> </property>
> 
> 
> Ed.
> 
> 
>> Date: Sat, 27 Sep 2008 01:30:43 -0700
>> From: [EMAIL PROTECTED]
>> To: [email protected]
>> Subject: crawl xml url using nutch-0.9
>> 
>> 
>> Hi All,
>> 
>> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml) using
>> depth 2.
>> 
>> But it will crawl only root url.
>> 
>> Please help me how to crawl root url as well as all sub url of root url.
>> 
>> Thanks in advance.
>> 
>> Regads,
>> Chetan Patel
>> -- 
>> View this message in context:
>> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>> 
> 
> _________________________________________________________________
> Get all your favourite content with the slick new MSN Toolbar - FREE
> http://clk.atdmt.com/UKM/go/111354027/direct/01/
> 

-- 
View this message in context: 
http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19701249.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to