[ https://issues.apache.org/jira/browse/YARN-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713858#comment-15713858 ]
Miklos Szegedi commented on YARN-5714: -------------------------------------- Thank you, for the patch, [~rcatherinot]! 1. "%%A%%" should not match any variable in Windows but the current patch returns "A". Double %% is replaced with single % 2. This is a potential double increase of i without checking i < l {code} 994 if (c == '\\') { 995 i++; 996 } 997 if (c == '\'') { 998 break; 999 } 1000 i++; {code} 3. There is a typo in this sentence: {code} 855 // 2 - we need a map implementation that support entries to be {code} 4. You have an iteration that has a depth of 3. This will run at every container launch, so it might cause perf issues. In particular {code}copy.containsKey(envDep){code} will be called again and again (6 times?) until all dependencies are added, if a set is ordered like this: {code} D=$A $B $C A=* B=* C=* {code} Have you considered an algorithm, where you scan the dependencies only once? You may not even need the hash map for it in this case. {code} for (env: envsI) stack.push(env) while(!stack.empty) env = stack.pop() if (!env.marked) env.mark if (env.dependencies.empty) stack.push(env) for(dep : end.dependencies()) stack.push(dep) else ordered.putifnotthere(env); {code} 5. It might be slower (needs to be tested) but have you considered regex for parsing? It would make the code shorter and easier to understand. > ContainerExecutor does not order environment map > ------------------------------------------------ > > Key: YARN-5714 > URL: https://issues.apache.org/jira/browse/YARN-5714 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.4.1, 2.5.2, 2.7.3, 2.6.4, 3.0.0-alpha1 > Environment: all (linux and windows alike) > Reporter: Remi Catherinot > Assignee: Remi Catherinot > Priority: Trivial > Labels: oct16-medium > Attachments: YARN-5714.001.patch, YARN-5714.002.patch, > YARN-5714.003.patch, YARN-5714.004.patch > > Original Estimate: 120h > Remaining Estimate: 120h > > when dumping the launch container script, environment variables are dumped > based on the order internally used by the map implementation (hash based). It > does not take into consideration that some env varibales may refer each > other, and so that some env variables must be declared before those > referencing them. > In my case, i ended up having LD_LIBRARY_PATH which was depending on > HADOOP_COMMON_HOME being dumped before HADOOP_COMMON_HOME. Thus it had a > wrong value and so native libraries weren't loaded. jobs were running but not > at their best efficiency. This is just a use case falling into that bug, but > i'm sure others may happen as well. > I already have a patch running in my production environment, i just estimate > to 5 days for packaging the patch in the right fashion for JIRA + try my best > to add tests. > Note : the patch is not OS aware with a default empty implementation. I will > only implement the unix version on a 1st release. I'm not used to windows env > variables syntax so it will take me more time/research for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org