[jira] [Resolved] (YARN-1791) Distributed cache issue using YARN

Jason Lowe (JIRA) Thu, 06 Mar 2014 06:07:24 -0800

     [ 
https://issues.apache.org/jira/browse/YARN-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe resolved YARN-1791.
------------------------------

    Resolution: Invalid

The distributed cache only preserves the basename of files and links them into 
the containers working directory.  If two names collide one can use the URI 
fragment to provide an alternative name for the symlink.  For example, 
hdfs:/a/b/c#d will be seen as "d" in the container working directory rather 
than "c".  If you require paths to be preserved then you can specify an archive 
(e.g.: .tar.gz, .zip, etc.) which will be expanded when localized and paths can 
exist within that.

In the future please use [mailto:[email protected]] for asking questions.  
Apache JIRA is for reporting bugs and tracking features/improvements.

> Distributed cache issue using YARN
> ----------------------------------
>
>                 Key: YARN-1791
>                 URL: https://issues.apache.org/jira/browse/YARN-1791
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ashish Kumar
>
> If I want to have two cache files a/b/c and d/e/c for an MR job then is there 
> any way to access Path of these files while reading it in Map or Reduce task?
> I'm using *job.addCacheFile(hdfsPath.toUri());* And then I'm accessing all 
> cache file paths using *context.getLocalCacheFiles()* which returns all paths 
> as given below:
> /yarn/?/?/?/1234/c and /yarn/?/?/?/2345/c
> But these paths don't have any folder level info so I'm not able to identify 
> which path is representing a/b/c. Is it bug?
> Please help.
> Thanks,
> Ashish



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-1791) Distributed cache issue using YARN

Reply via email to