[
https://issues.apache.org/jira/browse/YARN-9670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jan Filipiak updated YARN-9670:
-------------------------------
Comment: was deleted
(was: it bites us in YARN 2.6 i scanned through master briefly and couldn't
find anything that would fix it.)
> Missing Fsync for localized resources before updating to finalized in
> statestore
> --------------------------------------------------------------------------------
>
> Key: YARN-9670
> URL: https://issues.apache.org/jira/browse/YARN-9670
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.6.0
> Reporter: Jan Filipiak
> Priority: Major
>
> A resource that was localized is not properly FSynced before the
> state-manager is updated to track this resource as finalized. The Download is
> currently considered finished after the target local outputstream is closed.
> The data might not have made it to the blockdevice before the statestore is
> updated. Containers relying on the resource may see only parts of the
> resource after recovery usually leading to them crashing.
>
> Possible fixes:
> Introduce a new step in the state machine that Fsyncs the downloaded path
> before calling the statestore.
> On recovery we can compare the size (and we probably have to unpack archives
> again)
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]