[ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645348#comment-14645348
 ] 

Li Lu commented on YARN-3904:
-----------------------------

Hi [~sjlee0], thanks so much for the review! Some quick comments:

bq. Regarding PhoenixOfflineAggregatorWriterImpl, does it have to implement the 
TimelineWriter interface?
It is no longer plugged into the real-time write path, and as such, 
implementing TimelineWriter seems unnecessary.
That's actually exactly what I'm debating with myself! The more I'm working on 
the offline aggregator, the more I was feeling that it is not really beneficial 
to implement our offline storage as a {{TimelineWriter}}. However, the offline 
writer *is* actually a timeline writer. The natural distinction between the 
Phoenix writer with the HBase writer is if a writer works in the realtime or 
the offline workflow. Maybe we'd like to have something like 
{{TimelineRealTimeWriters}} and {{TimelineOfflineWriters}} (or 
{{TimelineOfflineStorage}} to accommodate both read and write code paths)? 
Realtime writers should focus on writing raw entity data with full context info 
as well as performing realtime aggregations. Offline writers can focus on 
offline aggregation storage. Thoughts?

bq. If we envision using this in a separate mechanism such as mapreduce, I 
think we ought to come up with a new interface for aggregation.
Yes. If we're separating realtime and offline writers, we have more freedom to 
design aggregation-specific writer interfaces. 

bq. Also, the actual work of reading the HBase tables (eventually the flow run 
table) and invoking the offline aggregator is not captured here. 
I'm planning to include the HBase aggregation table reader as part of 
YARN-3817, if that POC patch is not too big (so far I don't believe that's the 
case). Invoking the offline aggregator may probably come separately since we 
may need some further changes in the RM to post active flows. Does this plan 
work? 

> Refactor timelineservice.storage to add support to online and offline 
> aggregation writers
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-3904
>                 URL: https://issues.apache.org/jira/browse/YARN-3904
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-3904-YARN-2928.001.patch, 
> YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
> YARN-3904-YARN-2928.004.patch
>
>
> After we finished the design for time-based aggregation, we can adopt our 
> existing Phoenix storage into the storage of the aggregated data. In this 
> JIRA, I'm proposing to refactor writers to add support to aggregation 
> writers. Offline aggregation writers typically has less contextual 
> information. We can distinguish these writers by special naming. We can also 
> use CollectorContexts to model all contextual information and use it in our 
> writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to