Re: Supporting "resumable" operations on a large tree

Vikas Saurabh Thu, 23 Feb 2017 11:44:04 -0800

Hi,

A quick side-question related to what Stefan mentioned earlier:
> A stable traversal order at a given revision + node seems like a
prerequisite to me.


Javadoc of NodeState#getChildNodeEntries says:
".... Multiple iterations are guaranteed to return the child nodes in
the same order, but the specific order used is implementation
dependent and may change across different states of the same node."

I'm not entirely sure if that's completely ambiguous - but afaik the
behavior of current stores (Tar - both versions, MongoMK and RDB),
they'd indeed to do stable iterations at a given revision even on
multiple calls to getChildNodeEntries. I wonder if we should call it
out explicitly too.

Btw, I think we'd still need to make a checkpoint to safeguard against
rev-gc/compaction.

Also, afaiu,
[Marcel]
> An intermediate commit (OAK-2556) would have to be annotated with the
> current path, while the checkpoint stays the same. For the re-index use case
> this probably also means an indirection for the index data tree is
> necessary.

and

[Thomas]
> For example use a "fromPath"
> .. "toPath" range, and only re-index part of the repository at a time

are different ideas. Marcel probably meant an intermediate commit with
some prop making current_path while Thomas is probably saying that we
should do indexing on sub-trees and hence have kind-of shards of the
index.

I don't think we should do shard-approach at least on lucene indices
as doing post-query-merge won't give right relevance ordering.
Also, I don't think we have a good heuristic to assert a shard-depth
(e.g. if we shard at each children of root node and most of the
content is under /content, then we won't save much)

Thanks,
Vikas

Re: Supporting "resumable" operations on a large tree

Reply via email to