Hi Davide

Of course we need to measure the speed and improvements. I am not sure
how much time I will have to implement benchmarks for this, but will
try.

So far I have not tried the cached Tika full-text extraction, yet. I'm
curious how much gain it can provide though. This may make re-indexing
so cheap that we need not worry about it any more.

I hadn't really paid attention to OAK-2749, but it sounds interesting.
Similarly but differently, I was pondering the idea to allow
multithreaded tar2mongo copies. Since DocumentMK supports clustering,
it should be possible to copy different sub-trees in different
threads?!

It would indeed be interesting to have a chat some time! But first
I'll be on holidays for two weeks :)

Regards
Julian



On Wed, Aug 5, 2015 at 8:57 PM, Davide Giannella <[email protected]> wrote:
> On 05/08/2015 17:45, Julian Sedding wrote:
>> ...
>>
>> My aim is to reduce the critical path for migrating one NodeStore
>> (incl JR2) to another. Indexing (especially async indexing) takes is a
>> big part of the time, so if I can move that out of the critical path,
>> it can save a lot of downtime.
>
> Interesting. I know async index can be lengthy but it would be very
> interesting if we could measure what we have now and the improvements
> we're making.
>
> The slowest part of the async index is normally the full-text extraction
> as they run in a single thread. With
> https://issues.apache.org/jira/browse/OAK-2749 we provided a mechanism
> (not used yet AFAIK) to run different indexers on different threads.
> Maybe it's something you would like to experiment with as well to speed
> up the indexing.
>
> If you want ping me on chat tomorrow morning (CEST) so we can quickly
> see what we can do here. But I think we should start measuring it first :)
>
> Cheers
> Davide
>
>
>

Reply via email to