[
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786903#comment-13786903
]
Alejandro Abdelnur commented on YARN-1274:
------------------------------------------
[~vinodkv], I was planning to take a stab at it next week, if you are in a rush
go for it.
As some background:
* My current workaround from the AM side, create a dummy LocalResource against
file:///etc/hosts as application private. This triggers localization per app in
the node just once and using a local file I don't incur into any unnecessary
extra latency with HDFS.
* Possible solution 1: trigger resource localization always to force the LCE
localization and ensure creation of the usercache/USER even if there are not
application/private resources to localize.
* Possible solution 2: the LCE launcher should call mkdirat at usercache/USER
and do the chmod before launching the container process, if the dir already
exists because of localization this is a NOP. the mkdirat happens before doing
the setuid to launch the container process.
I prefer option 2 because it will avoid triggering the localization thread and
it will avoid adding extra latency to containers without localization.
> LCE fails to run containers that don't have resources to localize
> -----------------------------------------------------------------
>
> Key: YARN-1274
> URL: https://issues.apache.org/jira/browse/YARN-1274
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.1.1-beta
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Priority: Blocker
>
> LCE container launch assumes the usercache/USER directory exists and it is
> owned by the user running the container process.
> But the directory is created only if there are resources to localize by the
> LCE localization command, if there are not resourcdes to localize, LCE
> localization never executes and launching fails reporting 255 exit code and
> the NM logs have something like:
> {code}
> 2013-10-04 14:07:56,425 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command
> provided 1
> 2013-10-04 14:07:56,425 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is
> llama
> 2013-10-04 14:07:56,425 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create
> directory llama in
> /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_000004
> - Permission denied
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)