[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775711#comment-13775711
 ] 

Xuan Gong commented on YARN-1229:
---------------------------------

[~vinodkv], [~bikassaha], [~hitesh], [~sseth], [~jlowe], [~cnauroth]

The bug shows an error in launch_container.sh while trying to export 
NM_AUX_SERVICE_mapreduce.shuffle. The problem is that '.' is not considered a 
valid character in an environment variable. In order to solve this, we might 
need to rename the service name.
There are three places need to rename (use mapreduce_shuffle instead of 
mapreduce.shuffle):
{code}
  public static final String MAPREDUCE_SHUFFLE_SERVICEID =
      "mapreduce.shuffle";
{code}
in ShuffleHandler.java.

The other two places are in yarn_site.xml
{code}
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce.shuffle</value>
        <description>shuffle service that needs to be set for Map Reduce to run 
</description>
    </property>
    
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
{code}

We can just simply replace all three places with mapreduce_shuffle, or we can 
split the shuffle service out of the aux_services, say, create a new property 
called mapreduce_shuffle_service. The ShuffleHandler can read this property 
instead of defining MAPREDUCE_SHUFFLE_SERVICEID by itself. And 
AuxService#init() will need to read both mapreduce_shuffle_service and 
yarn.nodemanager.aux-services to do the initialization. 

An alternate is to convert all special characters to "_" - and 
AuxServiceHelpers becomes the public API to access this data.

Since we're trying to rename variables, this can be considered backward 
incompatible. I would like get in touch with folks who are already using it.
                
> Shell$ExitCodeException could happen if AM fails to start
> ---------------------------------------------------------
>
>                 Key: YARN-1229
>                 URL: https://issues.apache.org/jira/browse/YARN-1229
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.1.1-beta
>            Reporter: Tassapol Athiapinya
>            Assignee: Xuan Gong
>            Priority: Critical
>             Fix For: 2.1.1-beta
>
>
> I run sleep job. If AM fails to start, this exception could occur:
> 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
> state FAILED due to: Application application_1379673267098_0020 failed 1 
> times due to AM Container for appattempt_1379673267098_0020_000001 exited 
> with  exitCode: 1 due to: Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException: 
> /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_000001/launch_container.sh:
>  line 12: export: 
> `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
> ': not a valid identifier
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
> at org.apache.hadoop.util.Shell.run(Shell.java:379)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to