[
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302123#comment-16302123
]
Miklos Szegedi edited comment on YARN-2185 at 12/23/17 12:08 AM:
-----------------------------------------------------------------
Attaching my suggestion how to solve this. The code streams HDFS as standard
input to the tar and gzip commands. It handles Windows as well. As an addition
I create temporary files with permissions 700 instead of 755. I do not create
any additional temporary directories for extraction, one is enough. A
difference is that I use jar command for zips as well, so that it handles
Windows properly. Also I added an additional switch to be able to disable the
modification time check specifying -1 as the timestamp. I also do parallel copy
for directory localization to leverage the distributed storage in HDFS.
was (Author: [email protected]):
Attaching my suggestion how to solve this. The code streams HDFS as standard
input to the tar and gzip commands. It handles Windows as well. As an addition
I create temporary files with permissions 700 instead of 755. I do not create
any additional temporary directories for extraction, one is enough. A
difference is that I use jar command for zips as well, so that it handles
Windows properly. Also I added an additional switch to be able to disable the
modification time check specifying -1 as the timestamp.
> Use pipes when localizing archives
> ----------------------------------
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.4.0
> Reporter: Jason Lowe
> Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it,
> and then removes it. It would be more efficient to stream the data as it's
> being unpacked to avoid both the extra disk space requirements and the
> additional disk activity from storing the archive.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]