[
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320814#comment-16320814
]
Haibo Chen commented on YARN-7064:
----------------------------------
Thanks [[email protected]] for updating the patch. The patch looks
good to me overall. I have a few minor comments/questions.
1) Why are we skipping NM_MEMORY_RESOURCE_PREFIX configurations in
TestYarnConfigurationFields now?
2) I am not sure why exisiting NM_MEMORY_RESOURCE* configuration properties
(and all the other cgroup related ones) are marked as @Private and not included
in yarn-site.xml. But I think we need to be consistent with other properties.
Either we remove @Private annotation or we do not include them in yarn-site.xml
3) In ProcfsBasedProcessTree.java, do we also want to print out the clock time
as well besides the total jiffies? If so, it appears to me that CpuTimeTracker.
updateElapsedJiffies() is a more appropriate place to log.
4) TestCompareResourceCalculators is more of a functional test. Can we add in
the class javadoc what its purpose is and why it is ignored by default in the
code?
5) CominedRsourceCalculator can probably renamed to HybridResourceCalculator?
Do we want to delegate to procfs.getProcessTreeDump() int getProcessTreeDump()?
The most inefficient part of ProcfsBasedProcessTree, IIUC, is
updateProcessTree() in which we read the `stat` file of all processes in the
system. I believe the intent of CombinedResourceCalculator is to get the speed
of CgroupsResourceCalculator and the accuracy of ProcfsBasedProcessTree in its
virtual memory measurement, but given procfs.updateProcessTree() is always
called in CombinedResourceCalculator.updateProcessTree(), will
CombinedResourceCalculator be faster than ProcfsBasedProcessTree in such case?
6) In CGroupsResourceCalculator, processFile() should be static I think. Given
that exception in readTotalProcessJiffies() is propagated to the caller chain,
do you think 'Failed to parse' warning message is necessary to be logged there?
In the constructor `CGroupsResourceCalculator(String pid, String procfsDir,
CGroupsHandler cGroupsHandler, Clock clock)`,
`clock==SystemClock.getInstance()` is used to tell if it is called in unit
test, which is not a reliable indicator. I think we can create three
constructors instead, `CGroupsResourceCalculator(String, String,
CGroupsHandler, Clock, long jiffyLength)`, then we create one for production
that passes in `SysInfoLinux.JIFFY_LENGTH_IN_MILLIS`, and one that passes in 10
for unit testing.
7) ContainersMonitorImpl is missing an import and thus does not compile for me.
The new code in getResourceCalculatorProcessTree() for
CgroupsResourceCalculator and CombinedResourceCalculator are very similar. How
about we simplify it this way: Given the two new resource calculators must be
configured explicitly by users, we can consolidate the new code with
ResourceCalculatorProcessTree.getResourceCalculatorProcessTree() just as any
other custom resource calculators. We can add a new method `initialize()` in
ResourceCalculatorProcessTree with an empty body, and then override it
CGroupsResourceCalculator and CombinedResourceCalculator so that
`CGroupsResourceCalculator.isAvailable()` is checked and setCGroupFilePaths()
is called. Another problem with existing approach is that, if the user sets up
CGroupsResourceCalculator and cgroups is not enabled, `pt` will be null and we
continue to call
ResourceCalculatorProcessTree.getResourceCalculatorProcessTree() which returns
a new instance of CGroupsResourceCalculator. CGroups is not enabled, and yet we
will run NodeManager with CGroupsResourceCalculator.
> Use cgroup to get container resource utilization
> ------------------------------------------------
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Miklos Szegedi
> Assignee: Miklos Szegedi
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch,
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch,
> YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch,
> YARN-7064.009.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants
> to rebase patches against YARN-1011 instead of trunk.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]