On Fri, Jul 17, 2009 at 14:15, Saurabh Suman<saurabhsuman...@rediff.com> wrote: > > hi > I am crawling a feed url. http://blog.taragana.com/n/c/india/feed/. > I have set depth =2. > I am using FeedParser.java for parsing it. > For depth 1 in parseData in segments folder Parse Metadata for a url " > http://blog.taragana.com/n/30-child-labourers-rescued-in-agra-and-firozabad-111417/ > " is like this > Parse Metadata :author=Ani CharEncodingForConversion=utf-8 tag=Agra > tag=Firozabad tag=Uttar Pradesh tag=India OriginalCharEncoding=utf-8 > feed=http://blog.taragana.com/n published=1247778368000 . > As we can see it contains author. > > but for depth 2 parsemetadata for same url is like this: > Parse Metadata: CharEncodingForConversion=utf-8 OriginalCharEncoding=utf-8 > > when i search i am not getting author. i have following question regarding > this- > > (1)Does Nutch overwrite Parsed metadata of depth 1 with that of depth 2 > for this URL or does it merge the two? If it overwrites, then how can I stop > it from doing the same as I need the author and other information obtained > by parsing the RSS feed. > >
Searching for rss data such as author, etc is not yet implemented. I hope to implement it before next release. > > > -- > View this message in context: > http://www.nabble.com/Issue-with-Parse-metaData-while-crawling-RSSFeed-URL-tp24532613p24532613.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- Doğacan Güney