[ 
https://issues.apache.org/jira/browse/YARN-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588799#comment-15588799
 ] 

Varun Saxena edited comment on YARN-5751 at 10/19/16 5:09 PM:
--------------------------------------------------------------

Thanks [~rohithsharma] for sharing your views.

I do understand that it is not very clear what each metric value entails. For 
instance, I had to look back into code to find out whether MEMORY reported from 
NM for each container is in bytes or KB or MB, when I first looked at the REST 
output from timeline service.

Let us assume that we add UNIT to TimelineMetric to indicate KB/MB, etc. 
Question is how do we store it then ? Currently metric name is stored as a 
column qualifier and metric value as column value along with timestamps, for 
which we utilize HBase cell timestamps.  So question is where do we store this 
extra information ?
This can probably be stored as a suffix to the metric name but then this would 
impact metric filters. Or we can just add another column with metric name 
prefixed with a character indicating UNIT(say, something like u!MEMORY) to 
store metric unit and just read it back at all times or create necessary column 
filters if metrics to retrieve are specified. I will choose latter if I have to 
mandatorily choose some option.

But the question is can't memory name not indicate what the unit of metric is ? 
For instance, most of the Mapreduce counter names indicate unit too. We can 
publish MEMORY as MEMORY_BYTES instead.
Or is it even required ? Typically the systems publishing to us would know the 
unit of the metric they are writing. And hence would know what they are reading 
back. Except admins, it is unlikely somebody is going to use the REST URLs' 
directly. These endpoints will typically be used in a system which has another 
front-end to serve this data. Probably we can make metric names published from 
YARN or MAPREDUCE more understandable(i.e. suffixed with units) if somebody has 
to interpret REST output directly. Thoughts ?
You may say that this argument is based on HBase storage but then that is our 
primary storage implementation for now. So, what to store and what not may 
depend on combination of necessity and feasibility.
I am not completely sure if the need to store unit is strong enough to desire 
another column qualifier in HBase implementation. We can probably adopt the 
approach mentioned above if we have to store it. Do you have any other idea 
regarding how to store it ?
Is the concern that one code path may change(say, publishing side) and other 
may not (say, UI rendering) if we do not make unit part of our model ?

Let us see what others think though.
cc [~sjlee0], [~gtCarrera9]


was (Author: varun_saxena):
Thanks [~rohithsharma] for sharing your views.

I do understand that it is not very clear which each metric value entails. For 
instance, I had to look back into code to find out whether MEMORY reported from 
NM for each container is in bytes or KB or MB, when I first looked at the REST 
output from timeline service.

Let us assume that we add UNIT to TimelineMetric to indicate KB/MB, etc. 
Question is how do we store it then ? Currently metric name is stored as a 
column qualifier and metric value as column value along with timestamps, for 
which we utilize HBase cell timestamps.  So question is where do we store this 
extra information ?
This can probably be stored as a suffix to the metric name but then this would 
impact metric filters. Or we can just add another column with metric name 
prefixed with a character indicating UNIT(say, something like u!MEMORY) to 
store metric unit and just read it back at all times or create necessary column 
filters if metrics to retrieve are specified. I will choose latter if I have to 
mandatorily choose some option.

But the question is can't memory name not indicate what the unit of metric is ? 
For instance, most of the Mapreduce counter names indicate unit too. We can 
publish MEMORY as MEMORY_BYTES instead.
Or is it even required ? Typically the systems publishing to us would know the 
unit of the metric they are writing. And hence would know what they are reading 
back. Except admins, it is unlikely somebody is going to use the REST URLs' 
directly. These endpoints will typically be used in a system which has another 
front-end to serve this data. Probably we can make metric names published from 
YARN or MAPREDUCE more understandable(i.e. suffixed with units) if somebody has 
to interpret REST output directly. Thoughts ?
You may say that this argument is based on HBase storage but then that is our 
primary storage implementation for now. So, what to store and what not may 
depend on combination of necessity and feasibility.
I am not completely sure if the need to store unit is strong enough to desire 
another column qualifier in HBase implementation. We can probably adopt the 
approach mentioned above if we have to store it. Do you have any other idea 
regarding how to store it ?
Is the concern that one code path may change(say, publishing side) and other 
may not (say, UI rendering) if we do not make unit part of our model ?

Let us see what others think though.
cc [~sjlee0], [~gtCarrera9]

> Support UNIT for TimelineMetric
> -------------------------------
>
>                 Key: YARN-5751
>                 URL: https://issues.apache.org/jira/browse/YARN-5751
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: ATSv2
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>
> ATSv2 allows users to write its metrics using TimelineMetric. But, there is 
> no field to tell what is the UNIT of published metric. This is very difficult 
> when metrics are read. 
> I propose to add UNIT for TimelineMetric so that once user can use this field 
> to tell what is the unit of published metric.  May be this can be optional 
> for few kind or metrics where unit is not required say CPU. But definitely 
> there should be a way to set units while publishing the entities. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to