In my opinion, if somebody wants such a specialized parser with his own optimizations, he could simply write his own parser using nekohtml and plug into TIKA.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Jukka Zitting [mailto:jukka.zitt...@gmail.com] > Sent: Tuesday, December 16, 2008 12:07 AM > To: tika-dev@lucene.apache.org > Subject: Re: Extending existing Parsers - No easy to do right now, could > we make it easier? > > Hi, > > On Tue, Dec 9, 2008 at 1:04 PM, Stephane Bastian > <stephane.bastian....@gmail.com> wrote: > > In any case, as you pointed out Tika might not be the best place to do > this. > > However going back to my initial short term issue, which is extending > the > > Html Parser, I would definitely take the solution you proposed earlier > if > > it's still on the table ;) > > I thought about this a bit more (see TIKA-182), and I must say that > I'd rather not apply the patch to Tika. Doing so would create an extra > binding between client code and the underlying parser library, and > would make it difficult for us to later replace the parser if we > wanted to. > > BR, > > Jukka Zitting