Hi, Thanks for help.
I have already added this in plugin.includes. and still getting only root url. Regards, Chetan Patel Edward Quick wrote: > > > Chetan, > > Try adding parse-rss in nutch-site.xml. Here's mine: > > <property> > <name>plugin.includes</name> > > <value>protocol-httpclient|urlfilter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> > <description></description> > </property> > > > Ed. > > >> Date: Sat, 27 Sep 2008 01:30:43 -0700 >> From: [EMAIL PROTECTED] >> To: [email protected] >> Subject: crawl xml url using nutch-0.9 >> >> >> Hi All, >> >> I have tried to crawl xml url (http://sports.yahoo.com/nfl/rss.xml) using >> depth 2. >> >> But it will crawl only root url. >> >> Please help me how to crawl root url as well as all sub url of root url. >> >> Thanks in advance. >> >> Regads, >> Chetan Patel >> -- >> View this message in context: >> http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19700770.html >> Sent from the Nutch - User mailing list archive at Nabble.com. >> > > _________________________________________________________________ > Get all your favourite content with the slick new MSN Toolbar - FREE > http://clk.atdmt.com/UKM/go/111354027/direct/01/ > -- View this message in context: http://www.nabble.com/crawl-xml-url-using-nutch-0.9-tp19700770p19701249.html Sent from the Nutch - User mailing list archive at Nabble.com.
