Varun Saxena commented on YARN-4053:

Vrushali, thanks for your comments.
I would like to work on this. Let me take a stab on this one. Will have the 
I hope its fine. You can help me with the reviews.

Coming to the points, 
I agree that flag is not good for extensibility. As I said earlier, flag should 
be fine for now as we have only 2 choices(generic or long) and we can extend 
But eventually will have to have different handlers for different types. So why 
not do it now. Hence, lets go with proposal above.

Moreover, yes, we need to have proper handling based on data type or conversion 
mechanism in FlowScanner too. As mentioned in an earlier comment, I was 
thinking we can indicate this in attributes. But I guess your proposal sounds 
better. We can identify the column/column prefix in flow scanner as well and 
convert based on the converter attached to it.

bq. it missed one of the places in the current patch for example
Which place ? MIN/MAX handling ?

bq. For single value vs time series, we suggest using a column prefix to 
distinguish them
Do we need to have a differentiation between SINGLE_VALUE and TIME_SERIES if by 
default it will be read as SINGLE_VALUE ? Because we will be storing multiple 
values even for metric of type SINGLE_VALUE. Do you mean on the read side, only 
the latest value of a metric is to be returned if its of type SINGLE_VALUE 
(even if client asks for TIME_SERIES) ? Again the assumption here is that 
client will always send the metric type(SINGLE_VALUE or TIME_SERIES) 

bq. For the read path, we can assume it is a single value unless specifically 
specified by the client as a time series (as clients would need to intend to 
read time series explicitly).
We can return TIME_SERIES by indicating something like METRICS_TIME_SERIES as 
fields. If we do so, it will have implications on YARN-3862.
Now the question is whether to return values for multiple timestamps even for 
metric type of SINGLE_VALUE if client asks for it ? What if client wants to see 
values of a gauge(which might be considered as a SINGLE_VALUE) over a period of 
time, for instance. If yes, do we need to even differentiate between the 2 
types ?

bq. We finally concluded that we should start with storing longs only and make 
the code strictly accept longs 
JAX-RS i.e. the REST API layer will convert an integral value to Integer 
automatically if its less than Integer.MAX_VALUE so I guess we will have to 
handle ints and shorts as well i.e. if its an Integer for instance, we can call 
Integer#longValue to convert it to long.

bq. Regarding indicating whether to aggregate or not, we suggest to rely mostly 
on the flow run aggregation. For those use cases that need to access metrics 
off of tables other than the flow run table (e.g. time-based aggregation), we 
need to explore ways to specify this information as input (config, etc.)
I hope Li Lu is fine with this because I remember him saying on YARN-3816 that 
he will be using it for offline aggregation in YARN-3817. I think rows from 
application table are being used in the MR job there. Are you suggesting that 
for offline aggregation, based on config, we aggregate all the application 
metrics(to flow or user) or nothing ?
Or configure a set of metrics to aggregate in some config ?

> Change the way metric values are stored in HBase Storage
> --------------------------------------------------------
>                 Key: YARN-4053
>                 URL: https://issues.apache.org/jira/browse/YARN-4053
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>         Attachments: YARN-4053-YARN-2928.01.patch, 
> YARN-4053-YARN-2928.02.patch
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.

This message was sent by Atlassian JIRA

Reply via email to