Hi, Is Oak already single instance when it comes to the identification and storage of binaries ? Are the existing TextExtractors also single instance ? By Single instance I mean, 1 copy of the binary and its token stream in the repository regardless of how many times its referenced.
Best Regards Ian On 10 March 2015 at 07:05, Chetan Mehrotra <[email protected]> wrote: > LuceneIndexEditor currently extract the binary contents via Tika in > same thread which is used for processing the commit. Such an approach > does not make good use of multi processor system specifically when > index is being built up as part of migration process. > > Looking at JR2 I see LazyTextExtractor [1] which I think would help in > parallelize text extraction. > > Would it make sense to bring this to Oak. Would that help in improving > performance? > > Chetan Mehrotra > [1] > https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/LazyTextExtractorField.java >
