[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710633#comment-14710633 ]
Varun Saxena commented on YARN-4053: ------------------------------------ Wanted to discuss so that we can reach a consensus on how to handle YARN-4053. *Solution 1*: We can add a 1 byte flag as part of the metric value indicating whether we are storing integral value(0) or floating point value(1). *Solution 2* : Another solution suggested is that type can be part of column qualifier say something like metric=l where "l" indicates long. Another solution is to store everything as double. But would it be fair to impose this restriction on client while it reads data from ATS ? What if client is expecting a long and unable to handle a double. The major issues surrounding different approaches are that what if client does not report metric values consistently(same metric data type). Now let us look at the scenarios where metric values come into picture. *1.* While writing entity to HBase : Here, we need to consider that for the same entity, a particular metric can be reported in multiple write calls. So it is possible that in one write, all values for a particular metric are reported as long and in another write, all as floats. This can create inconsistency in both the solutions above (have different flags and encodings for same metric in Solution 1 and different column qualifiers for same metric in Solution 2). We can add a valuetype field in TimelineMetric which indicates whether a set of values are long or float. And throw an exception in TimelineMetric at the time of adding value if types are not consistent. This will atleast ensure same data type for a particular write call. But even here client should make sure that across writes they make sure data types are consistent. I think getting a row to find out column qualifier name or flags attached with the values wont be a viable option. So some sort of restriction on the part of the client(so that they send consistent data types for same metric) will have to be placed whether we adopt solution 1 or solution 2. Is there some HBase API I am not aware of ? *2.* While reading entity from HBase in the absence of any HBase filter : In this case there should be no issues in either solution 1 or solution 2. Because we read everything as bytes from HBase. We can do the appropriate conversion based on the flag or column qualifier name then. *3.* While reading entity from HBase in the presence of HBase filters : We can have 2 kinds of HBase filters. One filter is to retrieve specific columns(to determine which metrics to return) and other one is to trim down the rows/entities to be returned based on metric value comparison. The first class of filters which determine which columns to return, those should work in both the cases(Solution 1 and 2). Even in solution 2, because we use prefix filters as of now. If we use regex matching though, it might make things more complicated in case of Solution 2. For the second set of filters, we would require to know data type of the metric value in both the proposed solutions. Because SingleColumnValueFilter requires exact column qualifier name(for Solution 2). And for solution 1 also we should know the data type of metric so that we can append the value to be compared against with the flag(so that BinaryComparator can be used). If we add filters to our data object model, we can probably include data type in filters as well. But that again is dependent on client, whether it sends correct data type or not. As we saw in point 1, we need to impose restriction on the client that it sends same data type for every metric. Frankly it should be easy for client as well. If for a metric, client expects float values, it will most likely use Double or Float. Thoughts ? Or some other suggestions which can preclude the need for such a restriction. > Change the way metric values are stored in HBase Storage > -------------------------------------------------------- > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Varun Saxena > Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)