LuceneIndexEditor currently extract the binary contents via Tika in same thread which is used for processing the commit. Such an approach does not make good use of multi processor system specifically when index is being built up as part of migration process.
Looking at JR2 I see LazyTextExtractor [1] which I think would help in parallelize text extraction. Would it make sense to bring this to Oak. Would that help in improving performance? Chetan Mehrotra [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/LazyTextExtractorField.java
