Hi Thomas, On Fri, Feb 24, 2017 at 1:09 PM, Thomas Mueller <[email protected]> wrote: > 9) Sorting of path is needed, so that the repository can be processed bit > by bit by bit. For that, the following logic is used, recursively: read at > most 1000 child nodes. If there are more than 1000, then this subtree is > never split but processed in one step (so many child nodes can still lead > to large transactions, unfortunately). If less than 1000 child nodes, then > the names of all child nodes are read, and processed in sorted order > (sorted by node name).
This should work! So we can implement a "paginated tree traversal" via above approach and similar approach can be used for Lucene indexes. Would be good to record this in OAK-2556 (or better a new issue) and we can look into implementing it in those parts which do such large transaction (reindex async index, reindex sync index, content migration in sidegrade) etc Chetan Mehrotra
