Hi, It doesn't look like there is much going on in the FeedParser repository: http://jakarta.apache.org/commons/sandbox/feedparser/changelog-report.html
No activity since February. Kevin is pretty busy with Rojo, I'd guess, so I imagine we won't be seeing FeedParser 2.0 any time soon. However FeedParser _does_ look useful in the Nutch context, especially because of its auto discovery features. Is this the list of dependencies that's the problem? http://jakarta.apache.org/commons/sandbox/feedparser/dependencies.html Out of that list, I think the following are the only ones that Nutch doesn't use yet: - jdom - jaxen - xmlrpc - http client Doesn't seem too bad to me, and since parse-rss is an optional plugin, these Jars don't go into Nutch's lib directory, but instead in parse-rss/lib. I took a quick look at other plugins' dependencies: ./parse-pdf/lib/log4j-1.2.9.jar ./parse-pdf/lib/PDFBox-0.7.0.jar ./parse-msword/lib/poi-2.1-20040508.jar ./parse-msword/lib/poi-scratchpad-2.1-20040508.jar ./protocol-ftp/lib/commons-net-1.2.0-dev.jar ./clustering-carrot2/lib/FSA.jar ./clustering-carrot2/lib/carrot2-filter-lingo.jar ./clustering-carrot2/lib/violinstrings-1.0.2.jar ./clustering-carrot2/lib/Jama-1.0.1-patched.jar ./clustering-carrot2/lib/commons-collections-3.0.jar ./clustering-carrot2/lib/carrot2-util-common.jar ./clustering-carrot2/lib/commons-pool-1.1.jar ./clustering-carrot2/lib/log4j-1.2.8.jar ./clustering-carrot2/lib/nekohtml-0.9.2.jar ./clustering-carrot2/lib/carrot2-snowball-stemmers.jar ./clustering-carrot2/lib/carrot2-local-core.jar ./clustering-carrot2/lib/carrot2-util-tokenizer.jar ./ontology/lib/icu4j_2_6_1.jar ./ontology/lib/jena-2.1.jar ./ontology/lib/commons-logging-1.0.3.jar ./parse-html/lib/tagsoup-1.0rc3.jar ./parse-html/lib/nekohtml-0.9.4.jar ./protocol-httpclient/lib/commons-codec.jar ./protocol-httpclient/lib/commons-httpclient-3.0-rc2.jar It looks a number of plugins use 2+ Jars already, so parse-rss wouldn't be an exception. I'm for inclusion of Chris' parse-rss plugin in the repository. :) Otis --- Chris Mattmann <[EMAIL PROTECTED]> wrote: > Hi Andrzej, > > At the time that I was working diligently on this plugin > (April/May), I > had done some thorough research into finding what I felt would be the > most > flexible, reliable way to parse RSS files. The RSS feed parser out of > the > jakarta-commmons sandbox was what I found, and I stand by it. I > understand > your concerns however about its reliance on several libraries, but it > just > comes with the territory in this case. However, as noted in: > http://issues.apache.org/jira/browse/NUTCH-30 by Kevin Burton, when > feedparser 2.0 comes out, the reliance on the external libraries will > be > removed, so I think that by adopting the feedparser based plugin > right now, > we have a clear upgrade path that leads us to the plugin's > independence of > external libraries, without changing (much of) the underlying source > code. > > That's my two cents. > > Thanks! > > Cheers, > Chris Mattmann > > > > On 7/20/05 11:58 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote: > > > [EMAIL PROTECTED] wrote: > >> Hi, > >> > >> Does anyone know why Chris Mattmann's RSS plugin ( > >> http://issues.apache.org/jira/browse/NUTCH-30 ) wasn't put in the > >> repository, and whether there are plans to revive it and include > it? > > > > That's probably my fault. I was almost ready to import it, but then > > during the final review I hesitated - I'm wary of pulling in so > many > > dependencies. Then other things got in the way, and I sort of > dropped it > > for the moment... > > > > If there's no way to parse RSS reliably other than using these > dozens of > > libraries, so be it. Is this the case? > > ______________________________________________ > Chris A. Mattmann ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
