[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645348#comment-14645348 ]
Li Lu commented on YARN-3904: ----------------------------- Hi [~sjlee0], thanks so much for the review! Some quick comments: bq. Regarding PhoenixOfflineAggregatorWriterImpl, does it have to implement the TimelineWriter interface? It is no longer plugged into the real-time write path, and as such, implementing TimelineWriter seems unnecessary. That's actually exactly what I'm debating with myself! The more I'm working on the offline aggregator, the more I was feeling that it is not really beneficial to implement our offline storage as a {{TimelineWriter}}. However, the offline writer *is* actually a timeline writer. The natural distinction between the Phoenix writer with the HBase writer is if a writer works in the realtime or the offline workflow. Maybe we'd like to have something like {{TimelineRealTimeWriters}} and {{TimelineOfflineWriters}} (or {{TimelineOfflineStorage}} to accommodate both read and write code paths)? Realtime writers should focus on writing raw entity data with full context info as well as performing realtime aggregations. Offline writers can focus on offline aggregation storage. Thoughts? bq. If we envision using this in a separate mechanism such as mapreduce, I think we ought to come up with a new interface for aggregation. Yes. If we're separating realtime and offline writers, we have more freedom to design aggregation-specific writer interfaces. bq. Also, the actual work of reading the HBase tables (eventually the flow run table) and invoking the offline aggregator is not captured here. I'm planning to include the HBase aggregation table reader as part of YARN-3817, if that POC patch is not too big (so far I don't believe that's the case). Invoking the offline aggregator may probably come separately since we may need some further changes in the RM to post active flows. Does this plan work? > Refactor timelineservice.storage to add support to online and offline > aggregation writers > ----------------------------------------------------------------------------------------- > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Li Lu > Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch, > YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, > YARN-3904-YARN-2928.004.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. In this > JIRA, I'm proposing to refactor writers to add support to aggregation > writers. Offline aggregation writers typically has less contextual > information. We can distinguish these writers by special naming. We can also > use CollectorContexts to model all contextual information and use it in our > writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)