>> I'm not sure of how AsyncIndexer, LuceneIndexer type stuff works --
>> but, I'm assuming they'd be keeping some sort of bookmark to note
>> which revision has already been processed. I guess we can do something
>> similar here too.
>>
>>
> async updates happen in a single background thread, there will be no
> concurrent update conflicts because of this.

Oh, I meant that maybe we asynchronously prune property index when
async index processing is going on


>> BTW, do these indexers process independent of each other -- would it
>> make sense to chain such jobs so that each of these can work with just
>> one calculation of diff?
>>
>>
> I'm pretty sure that's already the case, I believe issues start when you
> have concurrent writes over the same index content from different threads.

Yes, that was a side-note about if we it's not the same diff over
which AsyncIndex and LuceneIndex work, then we can optimize that a
bit. For the current subject, we can probably plug-in property index
pruning here.


>> There's another idea: does it make sense for a document to assert that
>> it's semantically an 'intermediate' document -- created just to form a
>> hierarchy, hence conflicts related to such documents can be handled
>> accordingly. For OAK-2673, we had a heuristic for this -- the
>> conflicts were resolved for a document which lied under hidden tree
>> and had no visible properties. May be, we can even have a mixin for
>> this -- as the hierarchy intent could be very useful even for
>> applications (I've seen lot of automated tests that need to pre-create
>> hierarchy just to avoid such a conflict... ).
>>
>>
> I didn't follow closely this topic, but if it helps in any way, the
> property index storage (normal properties, not unique ones) already marks
> the leaves with a special flag (match=true), any other intermediary path
> doesn't contain this info. So for the index scenario I think you could come
> up with a way to merge conflicts by choosing a safe route of not deleting
> the intermediary paths. this combined with a more lenient purge strategy
> should reduce the pain.
While working on OAK-2673, we didn't want to tie conflict resolution
to the specific content storage for property indices -- which is why,
we instead chose that such conflict (add-add, delete-delete) would be
resolved only for hidden documents with no JCR properties. That was
essentially a hacky heuristic that could work for Property indexes and
yet not tie too strongly with it. Otoh, I think if the document itself
could declare (maybe as a hidden oak specific property to begin
with... and then promote it to some sort of mixin) the intent, then
that decision of dealing with such a conflict can become more firm and
un-hacky.

BTW, doing something on the lines of OAK-2673 (resolve conflicts,
instead of avoiding it) is a little intrusive and risky (e.g. as we
found that OAK-2673 leads to repository corruption mentioned in
OAK-2929). Otoh, delayed pruning feels much safer and non-intrusive
(except, of course, that the cost for unpruned index might be inflated
a little bit until it gets pruned). So, does it make sense to open an
issue for async pruning. We can probably open a different issue which
does lenient conflict resolution if document declares it separately.

Thanks,
Vikas

Reply via email to