Re: Issue with Parse metaData while crawling RSSFeed URL

Doğacan Güney Fri, 17 Jul 2009 04:59:44 -0700

On Fri, Jul 17, 2009 at 14:15, Saurabh Suman<saurabhsuman...@rediff.com> wrote:
>
> hi
> I am  crawling a feed url.  http://blog.taragana.com/n/c/india/feed/.
> I have set depth =2.
> I am using FeedParser.java for parsing it.
> For depth 1 in parseData in segments  folder  Parse Metadata for a url "
> http://blog.taragana.com/n/30-child-labourers-rescued-in-agra-and-firozabad-111417/
> " is   like this
> Parse Metadata :author=Ani CharEncodingForConversion=utf-8 tag=Agra
> tag=Firozabad tag=Uttar Pradesh tag=India OriginalCharEncoding=utf-8
> feed=http://blog.taragana.com/n published=1247778368000 .
> As we can see it contains  author.
>
> but for  depth 2 parsemetadata for same url is like this:
> Parse Metadata: CharEncodingForConversion=utf-8 OriginalCharEncoding=utf-8
>
> when i search i am not getting author. i have following question regarding
> this-
>
> (1)Does Nutch overwrite  Parsed metadata of depth 1 with that of depth 2
> for this URL or does it merge the two? If it overwrites, then how can I stop
> it from doing the same as I need the author and other information obtained
> by parsing the RSS feed.
>
>


Searching for rss data such as author, etc is not yet implemented. I
hope to implement
it before next release.

>
>
> --
> View this message in context: 
> http://www.nabble.com/Issue-with-Parse-metaData-while-crawling-RSSFeed-URL-tp24532613p24532613.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney

Re: Issue with Parse metaData while crawling RSSFeed URL

Reply via email to