Hi,
Is Oak already single instance when it comes to the identification and
storage of binaries ?
Are the existing TextExtractors also single instance ?
By Single instance I mean, 1 copy of the binary and its token stream in the
repository regardless of how many times its referenced.

Best Regards
Ian

On 10 March 2015 at 07:05, Chetan Mehrotra <[email protected]>
wrote:

> LuceneIndexEditor currently extract the binary contents via Tika in
> same thread which is used for processing the commit. Such an approach
> does not make good use of multi processor system specifically when
> index is being built up as part of migration process.
>
> Looking at JR2 I see LazyTextExtractor [1] which I think would help in
> parallelize text extraction.
>
> Would it make sense to bring this to Oak. Would that help in improving
> performance?
>
> Chetan Mehrotra
> [1]
> https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/LazyTextExtractorField.java
>

Reply via email to