[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-3637:
-------------------------------
    Attachment: YARN-3637-trunk.001.patch

Attached is a v01 patch for handling symlink names and fragments as part of the 
shared cache yarn api. The major part of the patch adds a new parameter to the 
use api call. This allows a user to specify a preferred name for a resources 
even if the name of the resource in the shared cache is different. With this 
additional parameter, the user can avoid naming conflicts that happen when 
using resources from the shared cache. Note that this patch does not solve the 
existing problem in YARN where resource symlinks get clobbered if two resources 
are specified with the same name. Furthermore, this approach assumes the path 
returned is going to be used to create a LocalResource and is leveraging the 
way YARN localization uses the fragment portion of a URI.

I think this makes it slightly easier for developers to implement shared cache 
support in their YARN application by abstracting away symlink/fragment 
management. Thoughts [~sjlee0] or anyone else?

> Handle localization sym-linking correctly at the YARN level
> -----------------------------------------------------------
>
>                 Key: YARN-3637
>                 URL: https://issues.apache.org/jira/browse/YARN-3637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-3637-trunk.001.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to