On 20/06/2014 18:06, Jukka Zitting wrote:
> Hi,
>
> Here's an idea for an index structure (for now somewhat specific to
> SegmentMK) for speeding up node name and property existence queries.
> ...
>
> INDEX UPDATES
>
> The index would be maintained by a normal index editor that for each
> added/removed items would update the respective name counts at each
> level of the index up to the root. These updates can be consolidated
> at each level of the index to keep the number of writes down to a
> minimum. The cost of updating the index would be O(d * k) where d is
> the depth of the changes in a commit and k the number of added/removed
> items. The cost is similar to that of the property index where k is
> the number of added/removed values. Changing the value of an existing
> property would trigger no updates to this index.
>
> Since the index updates work their way recursively up to the root of
> the tree, there is an inherent synchronization bottleneck in updating
> the count values higher up the tree. This (and the way storage is
> optimized) makes this index structure well suited for the SegmentMK,
> but presents a problem for the DocMK. In there a better way to
> implement a similar index might be to leverage the underlying indexing
> capabilities of MongoDB or whichever database is used under the hood.
> Alternatively it might be possible to adjust the index structure to
> avoid this problem, though for now I don't see any good way of doing
> so.
First of all I think it could work out well :)

What concern me most is the update part. AFAIU doing a node count it's
not that cheap so I guess you were thinking something around
getCount(MAX) and if the count == max do some estimation around what
could be there over the MAX limit.

The other bit is that is very Segment specific. As you already
highlighted we should come up with something different based on the
underlying used persistence.  It's not nice from a code point of view as
it will complicate things but I don't see any other way either as of
now. We could make it as if it's not there fall back on traversing as usual.

Last point is how/if we want the end user configure it. Within the
repository as a standard property index? I'm saying because it will have
an impact on repository size and performane for the update and maybe
someone would like to say something like: I know property foo won't bee
anywhere else than /a/b and in case I don't care. I would like to have
only /a/b updated for foo.

Cheers
Davide

Reply via email to