Hi, A Solr customer of mine is building an OEM search capability and wants to offer their users two levels of document parsers - basic level Tika parsers and a licensed add-on with INSO/ISYS.
I'm considering to suggest an architecture where Tika is used as the framework, and then wrap INSO or ISYS as Tika parsers inside the Tika framework for one single integration point in Solr (and elsewhere). I'm thinking to: * Wrap ISYS's content type detector in the Detector interface * Wrap ISYS's parser API in a new ISYSParser which does the actual work I see two options: A) Instansiate Tika with a custom tika-config when using ISYS, replacing all opensource parsers with ISYS B) Let the AutoDetectParser first try the ISYS detector/parser, and fallback to defaults if ISYS is not enabled/licensed. I think option B would be elegant, and provide open source solutions an easy upgrade path without the consuming application ever needing to know about abything else than the Tika API. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
