[
https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254750#comment-15254750
]
Hitesh Shah commented on YARN-4851:
-----------------------------------
Some general comments on usability ( have not reviewed the patch in detail)
- names need a bit of work e.g. SummaryDataReadTimeNumOps and
SummaryDataReadTimeAvgTime - not sure why NumOps has a relation to ReadTime and
time in ReadTimeAvgTime seems redundant.
- would be good to have the scale in there i.e. is time in millis or
seconds?
- updates to the timeline server docs for these metrics seems missing.
- what is the difference bet CacheRefreshTimeNumOps and CacheRefreshOps ?
- Likewise for LogCleanTimeNumOps vs LogsDirsCleaned or PutDomainTimeNumOps
vs PutDomainOps
- cache eviction rates needed?
- how do we get a count of how many cache refreshes were due to stale data
vs never cached/evicted earlier? do we need this?
- should be there 2 levels of metrics - one group enabled by default and a
second group for more detailed monitoring to reduce load on the metrics system?
- would be good to understand the request count at the ATSv1.5 level itself
to understand which calls end up going to summary vs cache vs fs-based lookups
( i.e. across all gets ).
- at the overall ATS level, an overall avg latency across all reqs might be
useful for a general health check
> Metric improvements for ATS v1.5 storage components
> ---------------------------------------------------
>
> Key: YARN-4851
> URL: https://issues.apache.org/jira/browse/YARN-4851
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Li Lu
> Assignee: Li Lu
> Attachments: YARN-4851-trunk.001.patch, YARN-4851-trunk.002.patch
>
>
> We can add more metrics to the ATS v1.5 storage systems, including purging,
> cache hit/misses, read latency, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)