[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786903#comment-13786903
 ] 

Alejandro Abdelnur commented on YARN-1274:
------------------------------------------

[~vinodkv], I was planning to take a stab at it next week, if you are in a rush 
go for it. 

As some background:

* My current workaround from the AM side, create a dummy LocalResource against 
file:///etc/hosts as application private. This triggers localization per app in 
the node just once and using a local file I don't incur into any unnecessary 
extra latency with HDFS.

* Possible solution 1: trigger resource localization always to force the LCE 
localization and ensure creation of the usercache/USER even if there are not 
application/private resources to localize.

* Possible solution 2: the LCE launcher should call mkdirat at usercache/USER 
and do the chmod before launching the container process, if the dir already 
exists because of localization this is a NOP. the mkdirat happens before doing 
the setuid to launch the container process.

I prefer option 2 because it will avoid triggering the localization thread and 
it will avoid adding extra latency to containers without localization.


> LCE fails to run containers that don't have resources to localize
> -----------------------------------------------------------------
>
>                 Key: YARN-1274
>                 URL: https://issues.apache.org/jira/browse/YARN-1274
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.1.1-beta
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Blocker
>
> LCE container launch assumes the usercache/USER directory exists and it is 
> owned by the user running the container process.
> But the directory is created only if there are resources to localize by the 
> LCE localization command, if there are not resourcdes to localize, LCE 
> localization never executes and launching fails reporting 255 exit code and 
> the NM logs have something like:
> {code}
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
> provided 1
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
> llama
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
> directory llama in 
> /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_000004
>  - Permission denied
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to