Parallelize text extraction from binary fields

Chetan Mehrotra Tue, 10 Mar 2015 00:07:57 -0700

LuceneIndexEditor currently extract the binary contents via Tika in
same thread which is used for processing the commit. Such an approach
does not make good use of multi processor system specifically when
index is being built up as part of migration process.


Looking at JR2 I see LazyTextExtractor [1] which I think would help in
parallelize text extraction.

Would it make sense to bring this to Oak. Would that help in improving
performance?

Chetan Mehrotra
[1] 
https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/LazyTextExtractorField.java

Parallelize text extraction from binary fields

Reply via email to