[
https://issues.apache.org/jira/browse/OAK-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570497#comment-14570497
]
Chetan Mehrotra commented on OAK-2953:
--------------------------------------
Tika Batch [1] (TIKA-1330) is something which meets our requirement here but is
more elaborate and meant for a generic batch processor.
For now would implement a simple indexer which can understand the repository
structure and perform extraction using a pool
[1] https://wiki.apache.org/tika/TikaBatchUsage
> Implement text extractor as part of oak-run
> -------------------------------------------
>
> Key: OAK-2953
> URL: https://issues.apache.org/jira/browse/OAK-2953
> Project: Jackrabbit Oak
> Issue Type: Sub-task
> Components: run
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.3.0
>
>
> Implement a crawler and indexer which can find out all binary content in
> repository under certain path and extracts text from them and store them
> somewhere
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)