[jira] [Resolved] (OAK-10341) Indexing: replace FlatFileStore+PersistedLinkedList with a tree store

Julian Reschke (Jira) Tue, 24 Sep 2024 07:25:52 -0700


     [ 
https://issues.apache.org/jira/browse/OAK-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Julian Reschke resolved OAK-10341.
----------------------------------
    Fix Version/s: 1.70.0
       Resolution: Fixed

> Indexing: replace FlatFileStore+PersistedLinkedList with a tree store
> ---------------------------------------------------------------------
>
>                 Key: OAK-10341
>                 URL: https://issues.apache.org/jira/browse/OAK-10341
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>             Fix For: 1.70.0
>
>
> Currently, for indexing large repositories with the document store, we first 
> read all nodes and write them to a sorted file (sorting and merging when 
> needed). Then we index from that sorted file (called "FlatFileStore").
> There are multiple problems with this mechanism:
> * The last merging stage of the flat file store is actually not needed: we 
> could index from the un-merged streams. It would save one step where we write 
> and read all the data.
> * It requires to know the aggregation in the index definition, in order to 
> have a set of "preferred children". If this is unknown, then indexing might 
> take nearly infinite time. 
> * Even if it is known, indexing might be very very slow, specially if there 
> are many direct child nodes for some of the nodes that require aggregation. 
> * It requires a PersistedLinkedList to avoid running out of memory. This 
> persisted linked list uses a key-value store internally. This is an 
> additional overhead: we store and read the data again. However, access to 
> that storage is still done using just an iterator, and not with a key lookup. 
> So performance can still be quite bad.
> * For parallel indexing, we split the flat file. This is not possible unless 
> if we know the aggregation. Sometimes splitting is not possible.
> We want to explore using a tree store that would solve all of the above 
> problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (OAK-10341) Indexing: replace FlatFileStore+PersistedLinkedList with a tree store

Reply via email to