[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735384#comment-14735384 ]
Sangjin Lee commented on YARN-3901: ----------------------------------- Thanks for the updated patch [~vrushalic]! I went over the new patch, and the following is the quick feedback. I'll also apply it with YARN-4074, and test it a little more. (HBaseTimelineWriterImpl.java) - l.141-155: the whole thing could be inside {{if (isApplication)...}} - l.264: this null check is not needed (FlowRunCoprocessor.java) - l.52: Is the {{TimestampGenerator}} class going to be used outside {{FlowRunCoprocessor}}? If not, I would argue that we should make it an inner class of {{FlowRunCoprocessor}}. At least we should make it non-public to keep it within the package. If it would see general use outside this class, then it might be better to make it a true public class in the common package. I suspect a non-public class might be what we want here. - l.52: let's make it final - l.54: style nit: I think the common style is to place the static variables before instance variables - Also, overall it seems we're using both the diamond operator (<>) and the old style generic declaration. It might be good to stick with one style (in which case the diamond operator might be better). - l.144: This would mean that some cell timestamps would have the unit of the milliseconds and others would be in nanoseconds. I'm a little bit concerned if we ever interpret these timestamps incorrectly. Could there be a more explicit way of clearly differentiating them? I don't have good suggestions at the moment. (FlowScanner.java) - variable ordering - l.210: Strictly speaking, {{GenericObjectMapper}} will return an integer if the value fits within an integer; so it's not exactly a concern for min/max (timestamps) but for caution we might want to stay with {{Number}} instead of long. (TimestampGenerator.java) - l.29: make it final - variable ordering - see above for the public/non-public comment (FlowActivityRowKey.java) - It's up to you, but you could leave the row key improvement to YARN-4074. That might be easier to manage the changes between yours and mine. I'm restructuring all *RowKey classes uniformly. (TestHBaseTimelineWriterImplFlowRun.java) - it might be good to have short comments on what each method is testing > Populate flow run data in the flow_run & flow activity tables > ------------------------------------------------------------- > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Vrushali C > Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)