[jira] [Comment Edited] (YARN-2185) Use pipes when localizing archives

2018-01-23 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337005#comment-16337005
 ] 

Miklos Szegedi edited comment on YARN-2185 at 1/24/18 6:30 AM:
---

I opened YARN-7803 for the unit test error. It is not related to this patch.


was (Author: miklos.szeg...@cloudera.com):
I opened YARN-7803 for the unit test error.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-2185.000.patch, YARN-2185.001.patch, 
> YARN-2185.002.patch, YARN-2185.003.patch, YARN-2185.004.patch, 
> YARN-2185.005.patch, YARN-2185.006.patch, YARN-2185.007.patch, 
> YARN-2185.008.patch, YARN-2185.009.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-2185) Use pipes when localizing archives

2017-12-22 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302123#comment-16302123
 ] 

Miklos Szegedi edited comment on YARN-2185 at 12/23/17 12:10 AM:
-

Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create the temporary directory with permissions 700 instead of 755. I do not 
create any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp. I also do parallel copy 
for directory localization to leverage the distributed storage in HDFS.


was (Author: miklos.szeg...@cloudera.com):
Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create temporary files with permissions 700 instead of 755. I do not create 
any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp. I also do parallel copy 
for directory localization to leverage the distributed storage in HDFS.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-2185) Use pipes when localizing archives

2017-12-22 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302123#comment-16302123
 ] 

Miklos Szegedi edited comment on YARN-2185 at 12/23/17 12:08 AM:
-

Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create temporary files with permissions 700 instead of 755. I do not create 
any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp. I also do parallel copy 
for directory localization to leverage the distributed storage in HDFS.


was (Author: miklos.szeg...@cloudera.com):
Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create temporary files with permissions 700 instead of 755. I do not create 
any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org