[
https://issues.apache.org/jira/browse/OAK-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Reschke updated OAK-9747:
--------------------------------
Fix Version/s: (was: 1.48.0)
> Download resume needs to handle hidden nodes
> --------------------------------------------
>
> Key: OAK-9747
> URL: https://issues.apache.org/jira/browse/OAK-9747
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: indexing
> Reporter: Thomas Mueller
> Assignee: Thomas Mueller
> Priority: Major
>
> We implement download resume of documents from mongodb for the indexing
> process. It works by saving the download state (last downloaded document's
> _modified and _id ) so that resume (if needed) could start from that point.
> The documents are first kept in memory and then dumped to file once the
> memory usage reaches a certain threshold. The state save is done after every
> dump.
> However not every document downloaded from mongodb reaches this point i.e.
> saving to disk. Some of those documents are filtered eg. hidden nodes -
> https://github.com/apache/jackrabbit-oak/blob/24c54e500883c512e078275d1f85c2899404997c/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/NodeStateEntryTraverser.java#L181
> So, if a download thread keeps on getting such hidden nodes continuously,
> that progress is not saved and if the download fails, and retry happens, it
> will again download all those hidden nodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)