Hi,

It doesn't look like there is much going on in the FeedParser
repository:
http://jakarta.apache.org/commons/sandbox/feedparser/changelog-report.html

No activity since February.  Kevin is pretty busy with Rojo, I'd guess,
so I imagine we won't be seeing FeedParser 2.0 any time soon.  However
FeedParser _does_ look useful in the Nutch context, especially because
of its auto discovery features.

Is this the list of dependencies that's the problem?
http://jakarta.apache.org/commons/sandbox/feedparser/dependencies.html

Out of that list, I think the following are the only ones that Nutch
doesn't use yet:

- jdom
- jaxen
- xmlrpc
- http client

Doesn't seem too bad to me, and since parse-rss is an optional plugin,
these Jars don't go into Nutch's lib directory, but instead in
parse-rss/lib.

I took a quick look at other plugins' dependencies:

./parse-pdf/lib/log4j-1.2.9.jar
./parse-pdf/lib/PDFBox-0.7.0.jar
./parse-msword/lib/poi-2.1-20040508.jar
./parse-msword/lib/poi-scratchpad-2.1-20040508.jar
./protocol-ftp/lib/commons-net-1.2.0-dev.jar
./clustering-carrot2/lib/FSA.jar
./clustering-carrot2/lib/carrot2-filter-lingo.jar
./clustering-carrot2/lib/violinstrings-1.0.2.jar
./clustering-carrot2/lib/Jama-1.0.1-patched.jar
./clustering-carrot2/lib/commons-collections-3.0.jar
./clustering-carrot2/lib/carrot2-util-common.jar
./clustering-carrot2/lib/commons-pool-1.1.jar
./clustering-carrot2/lib/log4j-1.2.8.jar
./clustering-carrot2/lib/nekohtml-0.9.2.jar
./clustering-carrot2/lib/carrot2-snowball-stemmers.jar
./clustering-carrot2/lib/carrot2-local-core.jar
./clustering-carrot2/lib/carrot2-util-tokenizer.jar
./ontology/lib/icu4j_2_6_1.jar
./ontology/lib/jena-2.1.jar
./ontology/lib/commons-logging-1.0.3.jar
./parse-html/lib/tagsoup-1.0rc3.jar
./parse-html/lib/nekohtml-0.9.4.jar
./protocol-httpclient/lib/commons-codec.jar
./protocol-httpclient/lib/commons-httpclient-3.0-rc2.jar


It looks a number of plugins use 2+ Jars already, so parse-rss wouldn't
be an exception.  I'm for inclusion of Chris' parse-rss plugin in the
repository. :)

Otis


--- Chris Mattmann <[EMAIL PROTECTED]> wrote:

> Hi Andrzej,
> 
>   At the time that I was working diligently on this plugin
> (April/May), I
> had done some thorough research into finding what I felt would be the
> most
> flexible, reliable way to parse RSS files. The RSS feed parser out of
> the
> jakarta-commmons sandbox was what I found, and I stand by it. I
> understand
> your concerns however about its reliance on several libraries, but it
> just
> comes with the territory in this case. However, as noted in:
> http://issues.apache.org/jira/browse/NUTCH-30  by Kevin Burton, when
> feedparser 2.0 comes out, the reliance on the external libraries will
> be
> removed, so I think that by adopting the feedparser based plugin
> right now,
> we have a clear upgrade path that leads us to the plugin's
> independence of
> external libraries, without changing (much of) the underlying source
> code.
> 
> That's my two cents.
> 
> Thanks!
> 
> Cheers,
>   Chris Mattmann
> 
> 
> 
> On 7/20/05 11:58 PM, "Andrzej Bialecki" <[EMAIL PROTECTED]> wrote:
> 
> > [EMAIL PROTECTED] wrote:
> >> Hi,
> >> 
> >> Does anyone know why Chris Mattmann's RSS plugin (
> >> http://issues.apache.org/jira/browse/NUTCH-30 ) wasn't put in the
> >> repository, and whether there are plans to revive it and include
> it?
> > 
> > That's probably my fault. I was almost ready to import it, but then
> > during the final review I hesitated - I'm wary of pulling in so
> many
> > dependencies. Then other things got in the way, and I sort of
> dropped it
> > for the moment...
> > 
> > If there's no way to parse RSS reliably other than using these
> dozens of
> > libraries, so be it. Is this the case?
> 
> ______________________________________________
> Chris A. Mattmann




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to