On Wed, Jul 8, 2009 at 09:24, Saurabh Suman <saurabhsuman...@rediff.com>wrote:
> > hi > I want to parse feedUrl using nutch.i tried to use > org.apache.nutch.parse.feed.FeedParser class. Its input is xml. I put in > xml > the link below. > http://timesofindia.indiatimes.com/rssfeedsdefault.cms > This url contains all rss feeds for newspaper.When i tried to use it > through > Rome Feed Parser it was giving me all the permalink, title,date etc. But > nutch parser doesnot give anything. > How can i get all the permalink,title,date in this url. > In conf/parse-plugins.xml: <mimeType name="text/xml"> <plugin id="parse-html" /> <plugin id="parse-rss" /> <plugin id="feed" /> </mimeType> The URL you mentioned has a text/xml content-type. And since you probably also have parse-html defined in your conf file, parse-html tries to parse the feeds. Try moving "feed" plugin higher so : <mimeType name="text/xml"> <plugin id="feed" /> <plugin id="parse-html" /> <plugin id="parse-rss" /> </mimeType> > > -- > View this message in context: > http://www.nabble.com/How-to-Parse-Rss-Feed-URL-tp24386051p24386051.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- Doğacan Güney