Don't forget ElementMappingContentHandler, its useful for others, too :) Uwe
----- UWE SCHINDLER Webserver/Middleware Development PANGAEA - Publishing Network for Geoscientific and Environmental Data MARUM - University of Bremen Room 2500, Leobener Str., D-28359 Bremen Tel.: +49 421 218 65595 Fax: +49 421 218 65505 http://www.pangaea.de/ E-mail: uschind...@pangaea.de > -----Original Message----- > From: Jukka Zitting [mailto:jukka.zitt...@gmail.com] > Sent: Wednesday, December 17, 2008 2:17 PM > To: tika-dev@lucene.apache.org > Subject: Fwd: Proposal: Commons SAX > > Hi, > > I think the SAX classes that we've come up in o.a.tika.sax would be > useful also to other projects that don't otherwise depend on Tika, so > I've contacted Apache Commons about the possibility of starting a > "Commons SAX" component to make the code available to a wider > audience. See below for the proposal. > > BR, > > Jukka Zitting > > > > ---------- Forwarded message ---------- > From: Jukka Zitting <jukka.zitt...@gmail.com> > Date: Wed, Dec 17, 2008 at 2:09 PM > Subject: Proposal: Commons SAX > To: Jakarta Commons Developers List <d...@commons.apache.org> > > > Hi, > > In the Apache Tika project [1] we use SAX quite a lot, and have > written a set of quite useful general utility classes for SAX > handling. > > For example, in org.apache.tika.sax [2] we have the following: > > * ContentHandlerDecorator - Convenient base class for writing > ContentHandler decorators > * EmbeddedContentHandler - Decorator that blocks startDocument() and > endDocument() calls > * TeeContentHandler - Forwards SAX events to multiple handlers > * TextContentHandler - Decorator that blocks everything but character > events (and start/endDocument) > * WriteOutContentHandler - Writes the contents of all character events > to a Writer > > In org.apache.tika.sax.xpath [3] we have a simple XPath subset > implementation that supports streaming and filtering of SAX events. In > other words, the implementation doesn't need a DOM tree to evaluate > XPath statements. > > I believe this code would be useful also outside Tika, and I was > thinking that it might perhaps make sense to create a Commons project > for this. I also know of some SAX processing classes in Cocoon and > Jackrabbit that could well be of interest to a wider audience. > > Do you think something like this would be interesting as a Commons > project? Are there other similar efforts that I should know of? I > looked at XML Commons in xml.apache.org, but it seems pretty dormant. > > [1] http://lucene.apache.org/tika/ > [2] http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/package- > summary.html > [3] > http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/xpath/package- > summary.html > > BR, > > Jukka Zitting