In my opinion, if somebody wants such a specialized parser with his own
optimizations, he could simply write his own parser using nekohtml and plug
into TIKA.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: Jukka Zitting [mailto:jukka.zitt...@gmail.com]
> Sent: Tuesday, December 16, 2008 12:07 AM
> To: tika-dev@lucene.apache.org
> Subject: Re: Extending existing Parsers - No easy to do right now, could
> we make it easier?
> 
> Hi,
> 
> On Tue, Dec 9, 2008 at 1:04 PM, Stephane Bastian
> <stephane.bastian....@gmail.com> wrote:
> > In any case, as you pointed out Tika might not be the best place to do
> this.
> > However going back to my initial short term issue, which is extending
> the
> > Html Parser, I would definitely take the solution you proposed earlier
> if
> > it's still on the table ;)
> 
> I thought about this a bit more (see TIKA-182), and I must say that
> I'd rather not apply the patch to Tika. Doing so would create an extra
> binding between client code and the underlying parser library, and
> would make it difficult for us to later replace the parser if we
> wanted to.
> 
> BR,
> 
> Jukka Zitting

Reply via email to