[
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258526#comment-15258526
]
Daniel Templeton commented on YARN-4958:
----------------------------------------
Oh, whoops. That was the intended behavior, but it looks like I clobbered it
accidentally. I'll add that to the next patch.
> The file localization process should allow for wildcards to reduce the
> application footprint in the state store
> ---------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.8.0
> Reporter: Daniel Templeton
> Assignee: Daniel Templeton
> Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local
> resources even though they're all uploaded to the same directory in HDFS.
> When using tools like Crunch without an uber JAR or when trying to take
> advantage of the shared cache, the number of libraries can be quite large.
> We've seen many cases where we had to turn down the max number of
> applications to prevent ZK from running out of heap because of the size of
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the
> NM allow wildcards in the resource localization paths. Specifically, we
> propose to allow a path to have a final component (name) set to "*", which is
> interpreted by the NM as "download the full directory and link to every file
> in it from the job's working directory." This behavior is the same as the
> current behavior when using -libjars, but avoids explicitly listing every
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as
> "\*.jar" or "file\*", as having multiple entries for a single directory
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache. That
> work will be left to a future JIRA. Specifically, this JIRA only applies
> when a full directory is uploaded. Currently the shared cache does not
> handle directory uploads.
> This JIRA proposes to allow for wildcards both in the internal processing of
> the -libjars switch and in paths added through the {{Job}} and
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of
> all file verification and localization. In the final step, the NM will query
> the localized directory to get a list of the files in "dir" such that each
> can be linked from the job's working directory. Since $PWD/\* is always
> included on the classpath, all JAR files in "dir" will be in the classpath.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)