[
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791374#comment-14791374
]
Junping Du commented on YARN-3816:
----------------------------------
Reply to Varun's comments:
bq. In TimelineMetric#accumulateTo, can latestMetric be TIME_SERIES ? If
not(seems to be the case as per current code), is the else part of the
condition if (latestMetric.getType().equals(Type.SINGLE_VALUE)) { required ?
Wont be handling TIME_SERIES then ?
I am not sure if I understand your comments correctly. But it definitely
support TIME_SERIES type for latestMetrics and handle two types separately.
bq. One of the aggregateMetrics method in TimelineCollector is not required if
you do not plan to use it elsewhere. Agree with Naga that a set should suffice
here.
YARN-3817 should use this. Will replace with SET as above comments.
bq. Default value can be mentioned explicity for
yarn.timeline-service.aggregation.accumulation.enabled in config file.
The same comments as above.
bq. REST API code can be removed I guess if not required for POC.
We use this REST API in our previous PoC. Any significant problem with these
REST API code? If not, we can leave it here and enhance it later.
bq. BTW, the aggregate flag appended into column qualifer will be used by
offline aggregator?
Yes. I think it can be used by offline aggregator in YARN-3817. Li, would you
confirm this?
Reply to Li's comments:
bq. TimelineAggregationBasis.java, Shall we differentiate realtime and offline
aggregations? IIUC, APPLICATION type represents realtime aggregation while the
other two represents offline aggregation.
I think we can also add another enum of AggregationType to have {online,
offline} which is orthogonal to aggregation level here. Until this JIRA, we
haven't invovle offline aggregation concept, may be add the concept of offline
aggregation in YARN-3817 sounds better?
bq. setToAggregate(), why do we need a final here?
Theoritically, all parameters in method should be added with final tag to mark
as immutable except we want to change it inside of method. This is also true
for local variable that we don't plan to reassign it after first assignment.
Although our convention doesn't force any rules on this, it should be better
and let's leave it here.
bq. So in conclusion, if t1 is null, we set delta to 0 since it's the first
value to be aggregated. Or else, we aggregate the delta?
That's true. I will update the comments which is slightly confusing people.
bq. I'm fine to keep the else part of the
latestMetric.getType().equals(Type.SINGLE_VALUE) for now. Maybe we'd like to
update this part when implementing TODO.
TODO is for checking metric_id, not type. We do support latestMetric as
TIME_SERIES instead of SINGLE_VALUE, but we pick up the last (latest time)
value in our logic. The TODO is saying we should check the metric id (CPU,
MEMORY, etc.) should be compatible as we don't want to aggregate CPU metric
into Memory metric. Today, this is guranteed by method caller's logic. Someday
later, we should also check inside of method as other callers could make wrong
on this. In addition, we cannot easily check metric_id should be equal as we
need to handle cases that we could do accumulation on different (but
compatible) metric IDs, like: CPU metrics accumulated on CPU_AREA metrics, etc.
bq. l.183, I noticed we're copying the whole list to rebuild the valueList,
then only used one element there? Are we sure the values list is small enough
all the time?
Do you mean the if case for "op.equals(Operation.REP)"? That's good point. Will
fix it in next patch.
bq. TimelineCollector.java, so we put intermediate aggregation status into the
collector as some hash maps. Will there be challenges to rebuild these status?
We may need to face the situations like nm restart (we're an aux service inside
nm).
These hashmaps should be easily rebuild with accessing the raw entity table for
all existing container metrics during NM restart.
bq. We need a method similar to appendAggregatedMetricsToEntities in YARN-3817.
Actually I believe this is a very helpful method in normal aggregation
workflow. It would be great if we can find a more visible place to put it.
Ok. I can make it as static method somewhere to share.
bq. I have one question about doAccumulation: in offline aggregation we're
using a two-staged aggregation: we perform a flow run level aggregation in the
mapper, and then aggregate entities in the same flow again in the reducer for
entities from different mappers. In this case, we need to set the
doAccumulation parameter into false for the aggregation in the reducer, right?
I think so. Your reducer seems to be just sum up flow run metrics (aggregated
and accumulated) together so you should set doAccumulation to false.
Will update patch according to above comments soon.
> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> ----------------------------------------------------------------------------
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf,
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch,
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch,
> YARN-3816-YARN-2928-v2.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include:
> resource (CPU, Memory) consumption across all containers, number of
> containers launched/completed/failed, etc. We need this for apps while they
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based
> on Application-level aggregations rather than raw entity-level data as much
> less raws need to scan (with filter out non-aggregated entities, like:
> events, configurations, etc.).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)