[
https://issues.apache.org/jira/browse/OAK-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Reschke resolved OAK-10341.
----------------------------------
Fix Version/s: 1.70.0
Resolution: Fixed
> Indexing: replace FlatFileStore+PersistedLinkedList with a tree store
> ---------------------------------------------------------------------
>
> Key: OAK-10341
> URL: https://issues.apache.org/jira/browse/OAK-10341
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
> Fix For: 1.70.0
>
>
> Currently, for indexing large repositories with the document store, we first
> read all nodes and write them to a sorted file (sorting and merging when
> needed). Then we index from that sorted file (called "FlatFileStore").
> There are multiple problems with this mechanism:
> * The last merging stage of the flat file store is actually not needed: we
> could index from the un-merged streams. It would save one step where we write
> and read all the data.
> * It requires to know the aggregation in the index definition, in order to
> have a set of "preferred children". If this is unknown, then indexing might
> take nearly infinite time.
> * Even if it is known, indexing might be very very slow, specially if there
> are many direct child nodes for some of the nodes that require aggregation.
> * It requires a PersistedLinkedList to avoid running out of memory. This
> persisted linked list uses a key-value store internally. This is an
> additional overhead: we store and read the data again. However, access to
> that storage is still done using just an iterator, and not with a key lookup.
> So performance can still be quite bad.
> * For parallel indexing, we split the flat file. This is not possible unless
> if we know the aggregation. Sometimes splitting is not possible.
> We want to explore using a tree store that would solve all of the above
> problems.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)