Hi, A quick side-question related to what Stefan mentioned earlier: > A stable traversal order at a given revision + node seems like a prerequisite to me.
Javadoc of NodeState#getChildNodeEntries says: ".... Multiple iterations are guaranteed to return the child nodes in the same order, but the specific order used is implementation dependent and may change across different states of the same node." I'm not entirely sure if that's completely ambiguous - but afaik the behavior of current stores (Tar - both versions, MongoMK and RDB), they'd indeed to do stable iterations at a given revision even on multiple calls to getChildNodeEntries. I wonder if we should call it out explicitly too. Btw, I think we'd still need to make a checkpoint to safeguard against rev-gc/compaction. Also, afaiu, [Marcel] > An intermediate commit (OAK-2556) would have to be annotated with the > current path, while the checkpoint stays the same. For the re-index use case > this probably also means an indirection for the index data tree is > necessary. and [Thomas] > For example use a "fromPath" > .. "toPath" range, and only re-index part of the repository at a time are different ideas. Marcel probably meant an intermediate commit with some prop making current_path while Thomas is probably saying that we should do indexing on sub-trees and hence have kind-of shards of the index. I don't think we should do shard-approach at least on lucene indices as doing post-query-merge won't give right relevance ordering. Also, I don't think we have a good heuristic to assert a shard-depth (e.g. if we shard at each children of root node and most of the content is under /content, then we won't save much) Thanks, Vikas
