[ 
https://issues.apache.org/jira/browse/OAK-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570497#comment-14570497
 ] 

Chetan Mehrotra commented on OAK-2953:
--------------------------------------

Tika Batch [1] (TIKA-1330) is something which meets our requirement here but is 
more elaborate and meant for a generic batch processor.

For now would implement a simple indexer which can understand the repository 
structure and perform extraction using a pool

[1] https://wiki.apache.org/tika/TikaBatchUsage

> Implement text extractor as part of oak-run
> -------------------------------------------
>
>                 Key: OAK-2953
>                 URL: https://issues.apache.org/jira/browse/OAK-2953
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: run
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.3.0
>
>
> Implement a crawler and indexer which can find out all binary content in 
> repository under certain path and extracts text  from them and store them 
> somewhere



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to