[
https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251911#comment-15251911
]
Daniel Templeton commented on YARN-4958:
----------------------------------------
This JIRA is in the same space as HADOOP-12747. It's solving the same problem
in a completely different way, with different side-effects. I think there is
room for and value in both JIRAs.
> The file localization process should allow for wildcards to reduce the
> application footprint in the state store
> ---------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4958
> URL: https://issues.apache.org/jira/browse/YARN-4958
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.8.0
> Reporter: Daniel Templeton
> Assignee: Daniel Templeton
> Priority: Critical
> Attachments: YARN-4958.001.patch
>
>
> When using the -libjars option to add classes to the classpath, every library
> so added is explicitly listed in the {{ContainerLaunchContext}}'s local
> resources even though they're all uploaded to the same directory in HDFS.
> When using tools like Crunch without an uber JAR or when trying to take
> advantage of the shared cache, the number of libraries can be quite large.
> We've seen many cases where we had to turn down the max number of
> applications to prevent ZK from running out of heap because of the size of
> the state store entries.
> Rather than listing all files independently, this JIRA proposes to have the
> NM allow wildcards in the resource localization paths. Specifically, we
> propose to allow a path to have a final component (name) set to "*", which is
> interpreted by the NM as "download the fell directory and link to every file
> in it from the job's working directory." This behavior is the same as the
> current behavior when using -libjars, but avoids explicitly listing every
> file.
> This JIRA does not attempt to provide more general purpose wildcards, such as
> "*.jar" or "file*", as having multiple entries for a single directory
> presents numerous logistical issues.
> This JIRA also does not attempt to integrate with the shared cache. That
> work will be left to a future JIRA.
> This JIRA proposes to allow for wildcards both in the internal processing of
> the -libjars switch and in paths added through the {{Job}} and
> {{DistributedCache}} classes.
> The proposed approach is to treat a path, "dir/*", as "dir" for purposes of
> all file verification. In the final step, the NM will query the localized
> directory to get a list of the files in "dir" such that each can be linked
> from the job's working directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)