[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997121#comment-14997121 ]
Varun Saxena commented on YARN-4053: ------------------------------------ Vrushali, thanks for your comments. I would like to work on this. Let me take a stab on this one. Will have the bandwidth. I hope its fine. You can help me with the reviews. Coming to the points, I agree that flag is not good for extensibility. As I said earlier, flag should be fine for now as we have only 2 choices(generic or long) and we can extend later. But eventually will have to have different handlers for different types. So why not do it now. Hence, lets go with proposal above. Moreover, yes, we need to have proper handling based on data type or conversion mechanism in FlowScanner too. As mentioned in an earlier comment, I was thinking we can indicate this in attributes. But I guess your proposal sounds better. We can identify the column/column prefix in flow scanner as well and convert based on the converter attached to it. bq. it missed one of the places in the current patch for example Which place ? MIN/MAX handling ? bq. For single value vs time series, we suggest using a column prefix to distinguish them Do we need to have a differentiation between SINGLE_VALUE and TIME_SERIES if by default it will be read as SINGLE_VALUE ? Because we will be storing multiple values even for metric of type SINGLE_VALUE. Do you mean on the read side, only the latest value of a metric is to be returned if its of type SINGLE_VALUE (even if client asks for TIME_SERIES) ? Again the assumption here is that client will always send the metric type(SINGLE_VALUE or TIME_SERIES) consistently. bq. For the read path, we can assume it is a single value unless specifically specified by the client as a time series (as clients would need to intend to read time series explicitly). We can return TIME_SERIES by indicating something like METRICS_TIME_SERIES as fields. If we do so, it will have implications on YARN-3862. Now the question is whether to return values for multiple timestamps even for metric type of SINGLE_VALUE if client asks for it ? What if client wants to see values of a gauge(which might be considered as a SINGLE_VALUE) over a period of time, for instance. If yes, do we need to even differentiate between the 2 types ? bq. We finally concluded that we should start with storing longs only and make the code strictly accept longs JAX-RS i.e. the REST API layer will convert an integral value to Integer automatically if its less than Integer.MAX_VALUE so I guess we will have to handle ints and shorts as well i.e. if its an Integer for instance, we can call Integer#longValue to convert it to long. bq. Regarding indicating whether to aggregate or not, we suggest to rely mostly on the flow run aggregation. For those use cases that need to access metrics off of tables other than the flow run table (e.g. time-based aggregation), we need to explore ways to specify this information as input (config, etc.) I hope Li Lu is fine with this because I remember him saying on YARN-3816 that he will be using it for offline aggregation in YARN-3817. I think rows from application table are being used in the MR job there. Are you suggesting that for offline aggregation, based on config, we aggregate all the application metrics(to flow or user) or nothing ? Or configure a set of metrics to aggregate in some config ? > Change the way metric values are stored in HBase Storage > -------------------------------------------------------- > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Varun Saxena > Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch, > YARN-4053-YARN-2928.02.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)