parse-rss e

ogjunk-nutch Wed, 28 Mar 2007 13:32:00 -0800

Hi,

Chris added the RSS parses plugin a while back.  I never used it, so I'm not 
sure what that stuff is really for.  Can somebody explain?


Normally fetching and indexing a single web page results in a single Document 
in the index.  What happens when an RSS feed is encountered?  If the RSS feed 
is full, we treat each item as its own page/Document, and if it's not, then we 
extract item links and include those in some future fetchlist?

How does the link to an RSS feed make it into a fetchlist to begin with?  One 
has to include it explicitly, or does some other parser also parse links to 
feeds from HEAD>LINK element? ( http://issues.apache.org/jira/browse/NUTCH-412 
?)

Thanks,
Otis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

parse-rss e

Reply via email to