Junping Du commented on YARN-3816:
Thanks [~varun_saxena] for review and comments!
bq. If we use same scheme for long or double, we may end up with 4 ORs' for a
single metric. Maybe we can use cell tags for aggregation.
That's good point! When I was doing poc patch a few weeks ago, YARN-4053
haven't been bring out to discussion so I thought it was a little overkill to
use cell tag for specifying the only boolean value. Now it seems to be a good
way, but I would prefer to defer this decision to YARN-4053 to address while
there are other priority comments to address here so we can move faster. What
do you think?
bq. Maybe in TimelineCollector#aggregateMetrics, we should do aggregation only
if the flag is enabled.
That's true. That's part of reason why aggregation flag is added to metric.
Will add check in next patch.
bq. In TimelineCollector#appendAggregatedMetricsToEntities any reason we are
creating separate TimelineEntity objects for each metric ? Maybe create a
single entity containing a set of metrics.
bq. 3 new maps have been introduced in TimelineCollector and these are used as
base to calculate aggregated value. What if the daemon crashes?
For RM, it could persistent maps to RMStateStore. For NM, it may not be enough
as NM could be lost also. We need a mechanism that if TimelineCollector is
relaunched somewhere else, it will read raw metrics and recover the maps before
start to working. This will be part of failed over JIRAs like: YARN-3115,
bq. In TimelineMetricCalculator some functions have duplicate if conditions for
bq. In TimelineMetricCalculator#sum, to avoid negative values due to overflow,
we can change conditions like below...
Like above comments, the overflow case will be handled in next patch.
bq. In TimelineMetric#aggregateTo, maybe use getValues instead of getValuesJAXB?
I would prefer to use TreeMap because it sort key (timestamp) when accessing
it. aggregateTo() algorithm assume metrics are sorted by timestamp.
bq. Also I was wondering if TimelineMetric#aggregateTo should be moved to some
util class. TimelineMetric is part of object model and exposed to client. And
IIUC aggregateTo wont be called by client.
As Li's mentioning below, it is a bit tricky to have utility class for any
classes in API, because it would mislead user to use it which is not our
intension, at least for now. aggregateTo is not straighfoward and generic
useful like methods in TimelineMetricCalculator, so let's hold on to expose it
as utility class for now. Make it static sounds good though.
bq. What is EntityColumnPrefix#AGGREGATED_METRICS meant for?
It is something developed at poc stage a few weeks ago, and it should be
removed after we moving to ApplicationTable.
> [Aggregation] App-level Aggregation for YARN system metrics
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf,
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include:
> resource (CPU, Memory) consumption across all containers, number of
> containers launched/completed/failed, etc. We need this for apps while they
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based
> on Application-level aggregations rather than raw entity-level data as much
> less raws need to scan (with filter out non-aggregated entities, like:
> events, configurations, etc.).
This message was sent by Atlassian JIRA