>> I'm not sure of how AsyncIndexer, LuceneIndexer type stuff works -- >> but, I'm assuming they'd be keeping some sort of bookmark to note >> which revision has already been processed. I guess we can do something >> similar here too. >> >> > async updates happen in a single background thread, there will be no > concurrent update conflicts because of this.
Oh, I meant that maybe we asynchronously prune property index when async index processing is going on >> BTW, do these indexers process independent of each other -- would it >> make sense to chain such jobs so that each of these can work with just >> one calculation of diff? >> >> > I'm pretty sure that's already the case, I believe issues start when you > have concurrent writes over the same index content from different threads. Yes, that was a side-note about if we it's not the same diff over which AsyncIndex and LuceneIndex work, then we can optimize that a bit. For the current subject, we can probably plug-in property index pruning here. >> There's another idea: does it make sense for a document to assert that >> it's semantically an 'intermediate' document -- created just to form a >> hierarchy, hence conflicts related to such documents can be handled >> accordingly. For OAK-2673, we had a heuristic for this -- the >> conflicts were resolved for a document which lied under hidden tree >> and had no visible properties. May be, we can even have a mixin for >> this -- as the hierarchy intent could be very useful even for >> applications (I've seen lot of automated tests that need to pre-create >> hierarchy just to avoid such a conflict... ). >> >> > I didn't follow closely this topic, but if it helps in any way, the > property index storage (normal properties, not unique ones) already marks > the leaves with a special flag (match=true), any other intermediary path > doesn't contain this info. So for the index scenario I think you could come > up with a way to merge conflicts by choosing a safe route of not deleting > the intermediary paths. this combined with a more lenient purge strategy > should reduce the pain. While working on OAK-2673, we didn't want to tie conflict resolution to the specific content storage for property indices -- which is why, we instead chose that such conflict (add-add, delete-delete) would be resolved only for hidden documents with no JCR properties. That was essentially a hacky heuristic that could work for Property indexes and yet not tie too strongly with it. Otoh, I think if the document itself could declare (maybe as a hidden oak specific property to begin with... and then promote it to some sort of mixin) the intent, then that decision of dealing with such a conflict can become more firm and un-hacky. BTW, doing something on the lines of OAK-2673 (resolve conflicts, instead of avoiding it) is a little intrusive and risky (e.g. as we found that OAK-2673 leads to repository corruption mentioned in OAK-2929). Otoh, delayed pruning feels much safer and non-intrusive (except, of course, that the cost for unpruned index might be inflated a little bit until it gets pruned). So, does it make sense to open an issue for async pruning. We can probably open a different issue which does lenient conflict resolution if document declares it separately. Thanks, Vikas
