Jan Filipiak created YARN-9670:
----------------------------------
Summary: Missing Fsync for localized resostatestoreurces before
updating to finalized in
Key: YARN-9670
URL: https://issues.apache.org/jira/browse/YARN-9670
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jan Filipiak
A resource that was localized is not properly FSynced before the state-manager
is updated to track this resource as finalized. The Download is currently
considered finished after the target local outputstream is closed. The data
might not have made it to the blockdevice before the statestore is updated.
Containers relying on the resource may see only parts of the resource after
recovery usually leading to them crashing.
Possible fixes:
Introduce a new step in the state machine that Fsyncs the downloaded path
before calling the statestore.
On recovery we can compare the size (and we probably have to unpack archives
again)
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]