Tim Allison created SOLR-11721:
----------------------------------

             Summary: Isolate Tika and dependencies into separate jvm
                 Key: SOLR-11721
                 URL: https://issues.apache.org/jira/browse/SOLR-11721
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Tim Allison


Tika should not be run in the same jvm as Solr.  Ever.  

Upgrading Tika and hoping to avoid jar hell, while getting all of the 
dependencies right manually is, um, error prone.  See my recent failure: 
SOLR-11622, for which I apologize profusely.

Running DIH against Tika's unit test documents has been eye-opening. It has 
revealed some other version conflict/dependency failures that should have been 
caught much earlier.

The fix is non-trivial, but we should work towards it.
I see two options:

1. TIKA-2514 -- Our current ForkParser offers a model for a minimal fork 
process + server option.  The limitation currently is that all parsers and 
dependencies must be serializable, which can be a problem for users adding 
their own parsers with deps that might not be designed for serializability.  
The proposal there is to rework the ForkParser to use a TIKA_HOME directory for 
all dependencies.

2. SOLR-7632 -- use tika-server, but make it seamless and as easy (and secure!) 
to use as the current handlers.

Other thoughts, recommendations?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to