[ 
https://issues.apache.org/jira/browse/OAK-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Ryan resolved OAK-7996.
----------------------------
    Resolution: Won't Fix

So far not a strong enough argument to avoid just doing this via Tika config.

> Ability to disable automatic text extraction via configuration
> --------------------------------------------------------------
>
>                 Key: OAK-7996
>                 URL: https://issues.apache.org/jira/browse/OAK-7996
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> This issue is to discuss allowing a user to disable automatic text extraction 
> of binary data via a configuration file.
> Currently you can save a tika.config file inside an index definition, which 
> overrides the default Tika configuration for that index.  You can use this 
> approach to disable automatic text extraction.
> I'd like to be able to do this at a global level - not per-index - via a 
> configuration file instead.  Then inside the document maker code somewhere, 
> we would check to see whether the candidate for text extraction has been 
> disabled by configuration.
> The value in this approach is that two instances can be identical in terms of 
> index definitions, only differing in local configuration.  Separate index 
> definitions don't have to be maintained.  And if you want to change which 
> files you extract text, you don't have to refresh an index to make it happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to