[ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320814#comment-16320814
 ] 

Haibo Chen commented on YARN-7064:
----------------------------------

Thanks [[email protected]] for updating the patch. The patch looks 
good to me overall. I have a few minor comments/questions.

1) Why are we skipping  NM_MEMORY_RESOURCE_PREFIX configurations in 
TestYarnConfigurationFields now?

2) I am not sure why exisiting NM_MEMORY_RESOURCE* configuration properties 
(and all the other cgroup related ones) are marked as @Private and not included 
in yarn-site.xml. But I think we need to be consistent with other properties. 
Either we remove @Private annotation or we do not include them in yarn-site.xml

3) In ProcfsBasedProcessTree.java, do we also want to print out the clock time 
as well besides the total jiffies? If so, it appears to me that CpuTimeTracker. 
updateElapsedJiffies() is a more appropriate place to log.

4) TestCompareResourceCalculators is more of a functional test. Can we add in 
the class javadoc what its purpose is and why it is ignored by default in the 
code?

5) CominedRsourceCalculator can probably renamed to HybridResourceCalculator? 
Do we want to delegate to procfs.getProcessTreeDump() int getProcessTreeDump()? 
The most inefficient part of ProcfsBasedProcessTree, IIUC, is 
updateProcessTree() in which we read the `stat` file of all processes in the 
system. I believe the intent of CombinedResourceCalculator is to get the speed 
of CgroupsResourceCalculator and the accuracy of ProcfsBasedProcessTree in its 
virtual memory measurement, but given procfs.updateProcessTree() is always 
called in CombinedResourceCalculator.updateProcessTree(),  will 
CombinedResourceCalculator be faster than ProcfsBasedProcessTree in such case?

6) In CGroupsResourceCalculator, processFile() should be static I think. Given 
that exception in readTotalProcessJiffies() is propagated to the caller chain, 
do you think 'Failed to parse' warning message is necessary to be logged there? 
In the constructor `CGroupsResourceCalculator(String pid, String procfsDir, 
CGroupsHandler cGroupsHandler, Clock clock)`, 
`clock==SystemClock.getInstance()` is used to tell if it is called in unit 
test, which is not a reliable indicator. I think we can create three 
constructors instead, `CGroupsResourceCalculator(String, String, 
CGroupsHandler, Clock, long jiffyLength)`, then we create one for production 
that passes in `SysInfoLinux.JIFFY_LENGTH_IN_MILLIS`, and one that passes in 10 
for unit testing. 

7) ContainersMonitorImpl is missing an import and thus does not compile for me. 
The new code in getResourceCalculatorProcessTree() for 
CgroupsResourceCalculator and CombinedResourceCalculator are very similar. How 
about we simplify it this way: Given the two new resource calculators must be 
configured explicitly by users, we can consolidate the new code with 
ResourceCalculatorProcessTree.getResourceCalculatorProcessTree() just as any 
other custom resource calculators. We can add a new method `initialize()` in 
ResourceCalculatorProcessTree with an empty body, and then override it 
CGroupsResourceCalculator and CombinedResourceCalculator so that 
`CGroupsResourceCalculator.isAvailable()` is checked and setCGroupFilePaths() 
is called.  Another problem with existing approach is that, if the user sets up 
CGroupsResourceCalculator and cgroups is not enabled, `pt` will be null and we 
continue to call  
ResourceCalculatorProcessTree.getResourceCalculatorProcessTree() which returns 
a new instance of CGroupsResourceCalculator. CGroups is not enabled, and yet we 
will run NodeManager with CGroupsResourceCalculator.

> Use cgroup to get container resource utilization
> ------------------------------------------------
>
>                 Key: YARN-7064
>                 URL: https://issues.apache.org/jira/browse/YARN-7064
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>         Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, 
> YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, 
> YARN-7064.009.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to