[ 
https://issues.apache.org/jira/browse/OAK-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-9747:
--------------------------------
    Fix Version/s:     (was: 1.48.0)

> Download resume needs to handle hidden nodes
> --------------------------------------------
>
>                 Key: OAK-9747
>                 URL: https://issues.apache.org/jira/browse/OAK-9747
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>
> We implement download resume of documents from mongodb for the indexing 
> process. It works by saving the download state (last downloaded document's 
> _modified and _id ) so that resume (if needed) could start from that point. 
> The documents are first kept in memory and then dumped to file once the 
> memory usage reaches a certain threshold. The state save is done after every 
> dump. 
> However not every document downloaded from mongodb reaches this point i.e. 
> saving to disk. Some of those documents are filtered eg. hidden nodes - 
> https://github.com/apache/jackrabbit-oak/blob/24c54e500883c512e078275d1f85c2899404997c/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/NodeStateEntryTraverser.java#L181
> So, if a download thread keeps on getting such hidden nodes continuously, 
> that progress is not saved and if the download fails, and retry happens, it 
> will again download all those hidden nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to