On 20/06/2014 18:06, Jukka Zitting wrote: > Hi, > > Here's an idea for an index structure (for now somewhat specific to > SegmentMK) for speeding up node name and property existence queries. > ... > > INDEX UPDATES > > The index would be maintained by a normal index editor that for each > added/removed items would update the respective name counts at each > level of the index up to the root. These updates can be consolidated > at each level of the index to keep the number of writes down to a > minimum. The cost of updating the index would be O(d * k) where d is > the depth of the changes in a commit and k the number of added/removed > items. The cost is similar to that of the property index where k is > the number of added/removed values. Changing the value of an existing > property would trigger no updates to this index. > > Since the index updates work their way recursively up to the root of > the tree, there is an inherent synchronization bottleneck in updating > the count values higher up the tree. This (and the way storage is > optimized) makes this index structure well suited for the SegmentMK, > but presents a problem for the DocMK. In there a better way to > implement a similar index might be to leverage the underlying indexing > capabilities of MongoDB or whichever database is used under the hood. > Alternatively it might be possible to adjust the index structure to > avoid this problem, though for now I don't see any good way of doing > so. First of all I think it could work out well :)
What concern me most is the update part. AFAIU doing a node count it's not that cheap so I guess you were thinking something around getCount(MAX) and if the count == max do some estimation around what could be there over the MAX limit. The other bit is that is very Segment specific. As you already highlighted we should come up with something different based on the underlying used persistence. It's not nice from a code point of view as it will complicate things but I don't see any other way either as of now. We could make it as if it's not there fall back on traversing as usual. Last point is how/if we want the end user configure it. Within the repository as a standard property index? I'm saying because it will have an impact on repository size and performane for the update and maybe someone would like to say something like: I know property foo won't bee anywhere else than /a/b and in case I don't care. I would like to have only /a/b updated for foo. Cheers Davide
