[
https://issues.apache.org/jira/browse/YARN-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe resolved YARN-1791.
------------------------------
Resolution: Invalid
The distributed cache only preserves the basename of files and links them into
the containers working directory. If two names collide one can use the URI
fragment to provide an alternative name for the symlink. For example,
hdfs:/a/b/c#d will be seen as "d" in the container working directory rather
than "c". If you require paths to be preserved then you can specify an archive
(e.g.: .tar.gz, .zip, etc.) which will be expanded when localized and paths can
exist within that.
In the future please use [mailto:[email protected]] for asking questions.
Apache JIRA is for reporting bugs and tracking features/improvements.
> Distributed cache issue using YARN
> ----------------------------------
>
> Key: YARN-1791
> URL: https://issues.apache.org/jira/browse/YARN-1791
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Ashish Kumar
>
> If I want to have two cache files a/b/c and d/e/c for an MR job then is there
> any way to access Path of these files while reading it in Map or Reduce task?
> I'm using *job.addCacheFile(hdfsPath.toUri());* And then I'm accessing all
> cache file paths using *context.getLocalCacheFiles()* which returns all paths
> as given below:
> /yarn/?/?/?/1234/c and /yarn/?/?/?/2345/c
> But these paths don't have any folder level info so I'm not able to identify
> which path is representing a/b/c. Is it bug?
> Please help.
> Thanks,
> Ashish
--
This message was sent by Atlassian JIRA
(v6.2#6252)