[ 
https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099871#comment-17099871
 ] 

Thomas Mueller commented on OAK-9052:
-------------------------------------

Data structure:
* FlatFileBufferLinkedList is used in the second phase and contains a list of 
NodeStateEntry objects.
* NodeStateEntry.nodeState is a LazyChildrenNodeState for entries in memory, 
but can be a DocumentNodeState when reading from MongoDB (in the first phase).
* NodeStateEntry objects can be (de-)serialized using the NodeStateEntryWriter 
/ NodeStateEntryReader. That is usually only used in the first phase.
* The temp file is stored in 
temp/flat-file-store/sort-work-dir/sortInBatch...flatfile (by default using 
compression).

> Reindexing using --doc-traversal-mode may need a lot of memory
> --------------------------------------------------------------
>
>                 Key: OAK-9052
>                 URL: https://issues.apache.org/jira/browse/OAK-9052
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing, mongomk
>            Reporter: Thomas Mueller
>            Priority: Major
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For 
> aggregation, there is a limit on memory usage, by default around 100 MB. 
> Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a 
> temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to