Hi, Chris added the RSS parses plugin a while back. I never used it, so I'm not sure what that stuff is really for. Can somebody explain?
Normally fetching and indexing a single web page results in a single Document in the index. What happens when an RSS feed is encountered? If the RSS feed is full, we treat each item as its own page/Document, and if it's not, then we extract item links and include those in some future fetchlist? How does the link to an RSS feed make it into a fetchlist to begin with? One has to include it explicitly, or does some other parser also parse links to feeds from HEAD>LINK element? ( http://issues.apache.org/jira/browse/NUTCH-412 ?) Thanks, Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share
