[
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319446#comment-16319446
]
Miklos Szegedi commented on YARN-2185:
--------------------------------------
Thank you for the review, [~jlowe] and [~grepas].
bq. This patch adds parallel copying of directories and ability to ignore
timestamps which IMHO is outside the scope of this patch. I think those should
be handled in separate JIRAs with appropriate unit tests.
I created YARN-7713 and YARN-7712 for these two ideas respectively.
bq. Do we really want a while loop to pull off extensions? That's not how it
behaved before, and it will now do different, potentially unexpected, things
for a ".jar.gz", ".tgz.tgz", ".gz.jar", etc. Granted these are unlikely
filenames to encounter, but if somehow a user had those before and were
working, this will change that behavior.
I agree, let's be more conservative and do not change the existing behavior.
bq. Does it make sense to use an external process to unpack .jar and .zip
files? The original code did this via the JDK directly rather than requiring a
subprocess, and given there's a ZipInputStream and JarInputStream, it seems we
could do the same here with the input stream rather than requiring the overhead
of a subprocess and copying of data between streams.
I was wondering, if it should be better for OS specific archives containing
symlinks for example. Here is what I suggest. I do not like the idea to use
different method for tar and zip. On the other hand we do not have enough
information to know, how many users would actually work better with OS toolset.
I am not in favor of too many config knobs either but in this case I created
one. The administrator can decide whether to rely on OS tools altogether or
java itself. Calling out jar indeed does not make much sense, since it does the
same what we would do in the child process. I think this solution is much
cleaner than arbitrarily use tar in Linux but rely on Java for jars.
bq. This now logs an INFO message for all stderr and stdout for the subprocess
I addressed this concern.
bq. Wouldn't "cd destpath && tar -xf" be better than "cd destpath; tar -xf"?
Otherwise if somehow the change directory fails it will end up unpacking to an
unexpected place.
I just followed the existing pattern. I agree, let’s be more resilient and I
will change to && as you suggested.
bq. Is the "rm -rf" necessary?
It does not hurt. I replaced with a delete that just returns, if the directory
does not exist
bq. Has this been tested on Windows? The zipOrJar path appears to be eligible
for Windows but uses shell syntax that cmd is not going to understand, like
semicolons between commands.
Good catch, the latest code uses the Java path now for Windows. However, I have
limited ability to test there.
bq. I wonder if it would be better to add a method to Shell that takes an input
stream so we could reuse all the subprocess handling code there.
My usage of ProcessBuilder seems very simple I am hesitant to overload the
shell executor with this functionality. This helps with backports as well.
I think I addressed most other comments in the next patch that I will submit
soon.
> Use pipes when localizing archives
> ----------------------------------
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.4.0
> Reporter: Jason Lowe
> Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch, YARN-2185.001.patch,
> YARN-2185.002.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it,
> and then removes it. It would be more efficient to stream the data as it's
> being unpacked to avoid both the extra disk space requirements and the
> additional disk activity from storing the archive.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]