Hi,

A Solr customer of mine is building an OEM search capability and wants to offer 
their users two levels of document parsers - basic level Tika parsers and a 
licensed add-on with INSO/ISYS.

I'm considering to suggest an architecture where Tika is used as the framework, 
and then wrap INSO or ISYS as Tika parsers inside the Tika framework for one 
single integration point in Solr (and elsewhere).

I'm thinking to:
* Wrap ISYS's content type detector in the Detector interface
* Wrap ISYS's parser API in a new ISYSParser which does the actual work

I see two options:
A) Instansiate Tika with a custom tika-config when using ISYS, replacing all 
opensource parsers with ISYS
B) Let the AutoDetectParser first try the ISYS detector/parser, and fallback to 
defaults if ISYS is not enabled/licensed.

I think option B would be elegant, and provide open source solutions an easy 
upgrade path without the consuming application ever needing to know about 
abything else than the Tika API.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Reply via email to