[
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sunil G updated YARN-6788:
--------------------------
Attachment: YARN-6788-YARN-3926.014.patch
Thanks [~templedf]
Updating new patch based on your comments. There are some comments which i have
some doubts and clarification provided. Kindly check the same.
bq.Is there a reason to have "known" (or "readOnly") in the variable names?
Doesn't seem to add any useful information.
Returning a map and then iterating on its values or keyset by using iterator is
performance bottleneck. As per my profiling analysis, it contributed around 15%
of performance as these methods are used as hotspot (Resources and
ResourceCalculator api). Hence we keep a readonly DS to return as value for all
these callers and work on same to loop or access on single entry based on index.
bq.ResourceUtils.resourceNameToIndex should be RESOURCE_NAME_TO_INDEX
Jus a doubt here. Why we need this to be in CAPS, is its because its final and
static?
bq.In ResourceUtils.initializeResourcesMap(), it seems to me that {{
knownResourceTypes = Collections.unmodifiableMap(resourceInformationMap);}}
should be moved into ResourceUtils.updateReadOnlyResources()
I think we will add one more argument to
ResourceUtils.updateReadOnlyResources/knownResources to take map and then copy
it. But its better to do as early as possible to avoid an extra argument as
possible.
bq.Why doesn't ResourceUtils.getResourceNamesArray() call getResourceTypes()?
As mentioned in first comment, cost of looping on iterator of map.values() is
costlierlieer. Hence its better to operate on simple array.
bq.The for loop to update resource names in ResourceUtils.getResourceTypes() is
a lovely opportunity for a lambda.
Yes. If possible, i would like take it up separately as its not affecting
functionality.
bq.n the for loop to update resource names in ResourceUtils.getResourceTypes()
memory and CPU aren't handled a priori, and there's no guarantee that the set
iteration will match the map values iteration in updateReadOnlyResources()
knownResourceNamesArray, resourceNameToIndex are in sync and in order. This is
because in updateKnownResources, we update Mem and CPU as first two entries and
other resource in following indices. Then resourceNameToIndex is derived from
knownResourceNamesArray itself. All our getter in Resources and
DominantCalcultor uses these DS. getResourceTypes is not used for normal
computation. It just used for initialization.
With this assumption we can assume resource will be in order. Please correct me
if you feel some issues.
bq.In the FixedValueResource() constructor, the resources array is not created
according to the resource type index, so resource lookups will fail.
In {{FixedValueResource.initResourceMap}} we use
{{ResourceUtils.getResourceTypesArray}} to get the index of resources. then we
created all resources in order.
bq.In Resources.addTo() the variable in the foreach should be named lhsValue
instead of declaring a new variable for it.
I have a loop optimization patch ready in this area. And i ll upload in another
jira. I would like that patch separate and on top of this since its loop
optimization for all apis in Resources and DominantResourceCalculator class.
Hence I will take this comment in that one if its fine.
bq.There's now no class that extends Resource other than BaseResource. Maybe
it's time to merge BaseResource into Resource?
I think its too early for that. Resource is basic public class and BaseResource
is internal class to operate on resource compared to ResourcePBImpl. I think we
can revise this once its merged and used well.
bq.In ResourcePBImpl.initResources(), the copy to readOnlyResources should
probably be moved into initResourcesMap()
Actually resources is created in initResourcesMap, but its populated in main
array. So I placed it there.
Regarding lock object in ResourceUtils, we use a double lock model to handle
data corruption during init. I guess this may be a cleaner approach ,how do you
feel?
I will wait for jenkins to see if any test error, if so i ll update another
patch
> Improve performance of resource profile branch
> ----------------------------------------------
>
> Key: YARN-6788
> URL: https://issues.apache.org/jira/browse/YARN-6788
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, resourcemanager
> Reporter: Sunil G
> Assignee: Sunil G
> Priority: Blocker
> Attachments: YARN-6788-YARN-3926.001.patch,
> YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch,
> YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch,
> YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch,
> YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch,
> YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch,
> YARN-6788-YARN-3926.012.patch, YARN-6788-YARN-3926.013.patch,
> YARN-6788-YARN-3926.014.patch
>
>
> Currently we could see a 15% performance delta with this branch.
> Few performance improvements to improve the same.
> Also this patch will handle
> [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418]
> from [~leftnoteasy].
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]