Chris Trezzo created YARN-3637: ---------------------------------- Summary: Handle localization sym-linking correctly at the YARN level Key: YARN-3637 URL: https://issues.apache.org/jira/browse/YARN-3637 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo
The shared cache needs to handle resource sym-linking at the YARN layer. Currently, we let the application layer (i.e. mapreduce) handle this, but it is probably better for all applications if it is handled transparently. Here is the scenario: Imagine two separate jars (with unique checksums) that have the same name job.jar. They are stored in the shared cache as two separate resources: checksum1/job.jar checksum2/job.jar A new application tries to use both of these resources, but internally refers to them as different names: foo.jar maps to checksum1 bar.jar maps to checksum2 When the shared cache returns the path to the resources, both resources are named the same (i.e. job.jar). Because of this, when the resources are localized one of them clobbers the other. This is because both symlinks in the container_id directory are the same name (i.e. job.jar) even though they point to two separate resource directories. Originally we tackled this in the MapReduce client by using the fragment portion of the resource url. This, however, seems like something that should be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)