[
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775711#comment-13775711
]
Xuan Gong commented on YARN-1229:
---------------------------------
[~vinodkv], [~bikassaha], [~hitesh], [~sseth], [~jlowe], [~cnauroth]
The bug shows an error in launch_container.sh while trying to export
NM_AUX_SERVICE_mapreduce.shuffle. The problem is that '.' is not considered a
valid character in an environment variable. In order to solve this, we might
need to rename the service name.
There are three places need to rename (use mapreduce_shuffle instead of
mapreduce.shuffle):
{code}
public static final String MAPREDUCE_SHUFFLE_SERVICEID =
"mapreduce.shuffle";
{code}
in ShuffleHandler.java.
The other two places are in yarn_site.xml
{code}
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
{code}
We can just simply replace all three places with mapreduce_shuffle, or we can
split the shuffle service out of the aux_services, say, create a new property
called mapreduce_shuffle_service. The ShuffleHandler can read this property
instead of defining MAPREDUCE_SHUFFLE_SERVICEID by itself. And
AuxService#init() will need to read both mapreduce_shuffle_service and
yarn.nodemanager.aux-services to do the initialization.
An alternate is to convert all special characters to "_" - and
AuxServiceHelpers becomes the public API to access this data.
Since we're trying to rename variables, this can be considered backward
incompatible. I would like get in touch with folks who are already using it.
> Shell$ExitCodeException could happen if AM fails to start
> ---------------------------------------------------------
>
> Key: YARN-1229
> URL: https://issues.apache.org/jira/browse/YARN-1229
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.1.1-beta
> Reporter: Tassapol Athiapinya
> Assignee: Xuan Gong
> Priority: Critical
> Fix For: 2.1.1-beta
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with
> state FAILED due to: Application application_1379673267098_0020 failed 1
> times due to AM Container for appattempt_1379673267098_0020_000001 exited
> with exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_000001/launch_container.sh:
> line 12: export:
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira