[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710633#comment-14710633
 ] 

Varun Saxena commented on YARN-4053:
------------------------------------

Wanted to discuss so that we can reach a consensus on how to handle YARN-4053.

*Solution 1*: We can add a 1 byte flag as part of the metric value indicating 
whether we are storing integral value(0) or floating point value(1).
*Solution 2* : Another solution suggested is that type can be part of column 
qualifier say something like metric=l where "l" indicates long.

Another solution is to store everything as double. But would it be fair to 
impose this restriction on client while it reads data from ATS ? What if client 
is expecting a long and unable to handle a double.


The major issues surrounding different approaches are that what if client does 
not report metric values consistently(same metric data type). 

Now let us look at the scenarios where metric values come into picture.
*1.* While writing entity to HBase : Here, we need to consider that for the 
same entity, a particular metric can be reported in multiple write calls. 
So it is possible that in one write, all values for a particular metric are 
reported as long and in another write, all as floats. This can create 
inconsistency in both the solutions above (have different flags and encodings 
for same metric in Solution 1 and different column qualifiers for same metric 
in Solution 2).
We can add a valuetype field in TimelineMetric which indicates whether a set of 
values are long or float. And throw an exception in TimelineMetric at the time 
of adding value if types are not consistent. This will atleast ensure same data 
type for a particular write call.
But even here client should make sure that across writes they make sure data 
types are consistent. I think getting a row to find out column qualifier name 
or flags attached with the values wont be a viable option. 
So some sort of restriction on the part of the client(so that they send 
consistent data types for same metric) will have to be placed whether we adopt 
solution 1 or solution 2.
Is there some HBase API I am not aware of ?

*2.* While reading entity from HBase in the absence of any HBase filter : In 
this case there should be no issues in either solution 1 or solution 2. Because 
we read everything as bytes from HBase. We can do the appropriate conversion 
based on the flag or column qualifier name then.

*3.* While reading entity from HBase in the presence of HBase filters : We can 
have 2 kinds of HBase filters. One filter is to retrieve specific columns(to 
determine which metrics to return) and other one is to trim down the 
rows/entities to be returned based on metric value comparison.
The first class of filters which determine which columns to return, those 
should work in both the cases(Solution 1 and 2). 
Even in solution 2, because we use prefix filters as of now. If we use regex 
matching though, it might make things more complicated in case of Solution 2.

For the second set of filters, we would require to know data type of the metric 
value in both the proposed solutions. Because SingleColumnValueFilter requires 
exact column qualifier name(for Solution 2). And for solution 1 also we should 
know the data type of metric so that we can append the value to be compared 
against with the flag(so that BinaryComparator can be used).
If we add filters to our data object model, we can probably include data type 
in filters as well. But that again is dependent on client, whether it sends 
correct data type or not.


As we saw in point 1, we need to impose restriction on the client that it sends 
same data type for every metric. Frankly it should be easy for client as well. 
If for a metric, client expects float values, it will most likely use Double or 
Float.

Thoughts ? Or some other suggestions which can preclude the need for such a 
restriction. 

> Change the way metric values are stored in HBase Storage
> --------------------------------------------------------
>
>                 Key: YARN-4053
>                 URL: https://issues.apache.org/jira/browse/YARN-4053
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to