Hi,

On Mon, Jun 23, 2014 at 3:07 AM, Davide Giannella
<[email protected]> wrote:
> What concern me most is the update part. AFAIU doing a node count it's
> not that cheap so I guess you were thinking something around
> getCount(MAX) and if the count == max do some estimation around what
> could be there over the MAX limit.

Since the counts are already included in the index structure, all we
need to do is iterate over the properties of the matching index node
and sum up the values. If that turns out to be too costly, we can even
keep a pre-calculated total count in an extra property.

> The other bit is that is very Segment specific.

Right. The proposed index is designed to leverage some of the benefits
of the SegmentMK model. The DocMK has different tradeoffs and thus can
better accommodate a differently organized index.

BTW, the same applies also to the property index. The default content
mirroring strategy used by the property index is actually unnecessary
overhead on the SegmentMK, where using a multi-valued property per
each index entry would be more efficient and would also easily give
more exact cost estimates as the number of matching paths would be
directly available.

> Last point is how/if we want the end user configure it. Within the
> repository as a standard property index?

I have two alternatives in mind:

1) Using a normal query index definition in /oak:index, with a custom
index type like "name".

2) Making the SegmentMK itself maintain such statistics in a hidden
subtree like /:segmentmk/names. There would be no need to explicitly
configure the index, and on repositories where that subtree is present
a matching QueryIndex implementation would automatically use it to
speed up affected queries.

I kind of like the latter approach, as it's conceptually similar to
the idea of using something like a MongoDB index to speed up certain
queries when using MongoMK.

> I'm saying because it will have an impact on repository size and performane
> for the update and maybe someone would like to say something like: I know
> property foo won't bee anywhere else than /a/b and in case I don't care. I
> would like to have only /a/b updated for foo.

As outlined in the proposal, the extra cost of updating this index is
actually pretty small. And the way the SegmentMK avoids duplicate
storage of names and values makes the size overhead pretty low.

So a non-configurable alternative like 2 might be just fine, or in
alternative 1 it would be possible to explicitly configure some
exclude rules like "I don't care about the 'foo' name, nor the '/bar'
subtree".

BR,

Jukka Zitting

Reply via email to