Hi, On Mon, Jun 23, 2014 at 3:07 AM, Davide Giannella <[email protected]> wrote: > What concern me most is the update part. AFAIU doing a node count it's > not that cheap so I guess you were thinking something around > getCount(MAX) and if the count == max do some estimation around what > could be there over the MAX limit.
Since the counts are already included in the index structure, all we need to do is iterate over the properties of the matching index node and sum up the values. If that turns out to be too costly, we can even keep a pre-calculated total count in an extra property. > The other bit is that is very Segment specific. Right. The proposed index is designed to leverage some of the benefits of the SegmentMK model. The DocMK has different tradeoffs and thus can better accommodate a differently organized index. BTW, the same applies also to the property index. The default content mirroring strategy used by the property index is actually unnecessary overhead on the SegmentMK, where using a multi-valued property per each index entry would be more efficient and would also easily give more exact cost estimates as the number of matching paths would be directly available. > Last point is how/if we want the end user configure it. Within the > repository as a standard property index? I have two alternatives in mind: 1) Using a normal query index definition in /oak:index, with a custom index type like "name". 2) Making the SegmentMK itself maintain such statistics in a hidden subtree like /:segmentmk/names. There would be no need to explicitly configure the index, and on repositories where that subtree is present a matching QueryIndex implementation would automatically use it to speed up affected queries. I kind of like the latter approach, as it's conceptually similar to the idea of using something like a MongoDB index to speed up certain queries when using MongoMK. > I'm saying because it will have an impact on repository size and performane > for the update and maybe someone would like to say something like: I know > property foo won't bee anywhere else than /a/b and in case I don't care. I > would like to have only /a/b updated for foo. As outlined in the proposal, the extra cost of updating this index is actually pretty small. And the way the SegmentMK avoids duplicate storage of names and values makes the size overhead pretty low. So a non-configurable alternative like 2 might be just fine, or in alternative 1 it would be possible to explicitly configure some exclude rules like "I don't care about the 'foo' name, nor the '/bar' subtree". BR, Jukka Zitting
