[ 
https://issues.apache.org/jira/browse/YARN-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959029#comment-15959029
 ] 

Feng Yuan commented on YARN-4095:
---------------------------------

[~zxu],thanks your patch for this issue.
Excuse me, i am not very clear the goal this patch achieve.Such as avoid the 
heap memory leak like YARN-6277,
because in:
{code}
      String newLocalDirs = conf.get(contextCfgItemName);
      if (!newLocalDirs.equals(savedLocalDirs)) {
{code}
it create massive LocalFileSystem objects and cache them.
If your purpose is fix this heap memory leak.  I guess i will understand this 
issue completetly.
And i have a idea, now that the issue is caused by the configuration is 
different in two place.
And i notice that ShuffleHandler use a another conf object by clone(conf) 
method,how about let "SH" use the same conf?
This leads to several benefits:
1. ShuffleHandler service will timely know which disk is over-used(>95%),and 
will not write data to it,avoid some map output 
work to a overload disk and break by error "no space left...".
2. if we could think over the implementation model in your patch, IMHO i feel 
it is not very grace just add a new name of local-dir.
Thx.

> Avoid sharing AllocatorPerContext object in LocalDirAllocator between 
> ShuffleHandler and LocalDirsHandlerService.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4095
>                 URL: https://issues.apache.org/jira/browse/YARN-4095
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>             Fix For: 2.8.0, 3.0.0-alpha1
>
>         Attachments: YARN-4095.000.patch, YARN-4095.001.patch
>
>
> Currently {{ShuffleHandler}} and {{LocalDirsHandlerService}} share 
> {{AllocatorPerContext}} object in {{LocalDirAllocator}} for configuration 
> {{NM_LOCAL_DIRS}} because {{AllocatorPerContext}} are stored in a static 
> TreeMap with configuration name as key
> {code}
>   private static Map <String, AllocatorPerContext> contexts = 
>                  new TreeMap<String, AllocatorPerContext>();
> {code}
> {{LocalDirsHandlerService}} and {{ShuffleHandler}} both create a 
> {{LocalDirAllocator}} using {{NM_LOCAL_DIRS}}. Even they don't use the same 
> {{Configuration}} object, but they will use the same {{AllocatorPerContext}} 
> object. Also {{LocalDirsHandlerService}} may change {{NM_LOCAL_DIRS}} value 
> in its {{Configuration}} object to exclude full and bad local dirs, 
> {{ShuffleHandler}} always uses the original {{NM_LOCAL_DIRS}} value in its 
> {{Configuration}} object. So every time {{AllocatorPerContext#confChanged}} 
> is called by {{ShuffleHandler}} after {{LocalDirsHandlerService}}, 
> {{AllocatorPerContext}} need be reinitialized because {{NM_LOCAL_DIRS}} value 
> is changed. This will cause some overhead.
> {code}
>       String newLocalDirs = conf.get(contextCfgItemName);
>       if (!newLocalDirs.equals(savedLocalDirs)) {
> {code}
> So it will be a good improvement to not share the same 
> {{AllocatorPerContext}} instance between {{ShuffleHandler}} and 
> {{LocalDirsHandlerService}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to