Re: Reindexing problems

Chetan Mehrotra Wed, 21 Oct 2015 03:01:43 -0700

> (a) Hardcode (not rely on the Whiteboard or OSGi) the known indexes

That would not work if the implementation makes use of OSGi features
like configuration or DI. For e.g. Lucene implementation relies on
OSGi config and also to expose certain extension points


> (b) Where we can't use hardcoding, use hard service references (Whiteboard / 
> OSGi).

+1. That would be preferable. I think we can go for approach taken in
OAK-3201 as depending on setup even custom implementation might be
required. So just hard references would not help and we would need to
make the component which registers repository to be aware of all its
*required* dependencies

>  (c) If we can't do that, block or fail commits if one of the configured 
> indexes is not available, for example for the Solr index (if such an index is 
> configured).

+1. Current approach is problamatic. Missing index provider is more of
a setup issue which can be addressed by system admin and repository
should not try to handle that. So failing the commit should be fine.

> Additionally, for "synchronous" indexes (property index and so on), I would 
> like to always create and reindex them asynchronously by default,

That might be tricky for DocumentNodeStore as even if you build them
asynchronously when final merge happens then it might be very
expensive to deal with such a large branch commit. Also if a critical
index like uuid/reference index it would be better if system does not
get started otherwise it would trigger large traversal if no index was
present or previous revision of index is not usable (due to some
corruption)
Chetan Mehrotra


On Wed, Oct 21, 2015 at 2:24 PM, Thomas Mueller <[email protected]> wrote:
> Hi,
>
> If an index provider is (temporarily) not available, the 
> MissingIndexProviderStrategy resets the index so it is re-indexed. This is a 
> problem (OAK-2024, OAK-2203, OAK-2429, OAK-3325, OAK-3366, OAK-3505, 
> OAK-3512, OAK-3513), because re-indexing is slow and one transaction. It can 
> also cause many threads to concurrently build the index. Currently, 
> synchronous indexes are built in one "transaction", which is anyway a 
> performance problem (for new indexes and reindexing). If an index is not 
> available when running a query, traversal is used, which is also a problem.
>
> What about:
>
> * (a) Hardcode (not rely on the Whiteboard or OSGi) the known indexes for 
> property, reference, nodeType, lucene, counter index. This is for both 
> writing (IndexEditor) and reading (QueryIndex) . That way, those indexes are 
> always available, and we never get into a situation where they are 
> temporarily not available.
>
> * (b) Where we can't use hardcoding, use hard service references (Whiteboard 
> / OSGi).
>
> * (c) If we can't do that, block or fail commits if one of the configured 
> indexes is not available, for example for the Solr index (if such an index is 
> configured).
>
> Additionally, for "synchronous" indexes (property index and so on), I would 
> like to always create and reindex them asynchronously by default, and only 
> once they are available switch to sychronous mode. I think (but I'm not sure) 
> this is OAK-1456.
>
> What do you think?
>
> Regards,
> Thomas
>

Re: Reindexing problems

Reply via email to