[ 
https://issues.apache.org/jira/browse/YARN-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713858#comment-15713858
 ] 

Miklos Szegedi commented on YARN-5714:
--------------------------------------

Thank you, for the patch, [~rcatherinot]!
1. "%%A%%" should not match any variable in Windows but the current patch 
returns "A". Double %% is replaced with single %
2. This is a potential double increase of i without checking i < l
{code}
994                 if (c == '\\') {
995                   i++;
996                 }
997                 if (c == '\'') {
998                   break;
999                 }
1000                i++;
{code}
3. There is a typo in this sentence:
{code}
855           // 2 - we need a map implementation that support entries to be
{code}

4. You have an iteration that has a depth of 3. This will run at every 
container launch, so it might cause perf issues. In particular 
{code}copy.containsKey(envDep){code} will be called again and again (6 times?) 
until all dependencies are added, if a set is ordered like this:
{code}
D=$A $B $C
A=*
B=*
C=*
{code}

Have you considered an algorithm, where you scan the dependencies only once? 
You may not even need the hash map for it in this case.
{code}
for (env: envsI)
  stack.push(env)
while(!stack.empty)
  env = stack.pop()
  if (!env.marked)
    env.mark
    if (env.dependencies.empty)
      stack.push(env)
      for(dep : end.dependencies())
        stack.push(dep)
  else
    ordered.putifnotthere(env);
{code}
5. It might be slower (needs to be tested) but have you considered regex for 
parsing? It would make the code shorter and easier to understand.

> ContainerExecutor does not order environment map
> ------------------------------------------------
>
>                 Key: YARN-5714
>                 URL: https://issues.apache.org/jira/browse/YARN-5714
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.4.1, 2.5.2, 2.7.3, 2.6.4, 3.0.0-alpha1
>         Environment: all (linux and windows alike)
>            Reporter: Remi Catherinot
>            Assignee: Remi Catherinot
>            Priority: Trivial
>              Labels: oct16-medium
>         Attachments: YARN-5714.001.patch, YARN-5714.002.patch, 
> YARN-5714.003.patch, YARN-5714.004.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> when dumping the launch container script, environment variables are dumped 
> based on the order internally used by the map implementation (hash based). It 
> does not take into consideration that some env varibales may refer each 
> other, and so that some env variables must be declared before those 
> referencing them.
> In my case, i ended up having LD_LIBRARY_PATH which was depending on 
> HADOOP_COMMON_HOME being dumped before HADOOP_COMMON_HOME. Thus it had a 
> wrong value and so native libraries weren't loaded. jobs were running but not 
> at their best efficiency. This is just a use case falling into that bug, but 
> i'm sure others may happen as well.
> I already have a patch running in my production environment, i just estimate 
> to 5 days for packaging the patch in the right fashion for JIRA + try my best 
> to add tests.
> Note : the patch is not OS aware with a default empty implementation. I will 
> only implement the unix version on a 1st release. I'm not used to windows env 
> variables syntax so it will take me more time/research for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to