[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652280#comment-14652280 ]
Sangjin Lee commented on YARN-3904: ----------------------------------- Sorry [~gtCarrera9] it took me a while to catch up with this. Thanks for the updated patch! (OfflineAggregationWriter.java) - Actually I'd like to ask whether this needs to be a service. :) Note that it is possible (or likely) that the writer will be executed in a mapreduce task. If this is executed within a mapper task, what does it mean for it be a service? It would not add much value, and at minimum the initialization/start/stop should be possbile outside the service framework, right? (PhoenixOfflineAggregationWriterImpl.java) - Should the primary key user and then cluster, or cluster and user? I think it might be better if it is cluster and user although it is different than the entity table. [~vrushalic]? - For the user aggregation tables, I believe the cluster needs to be included in the row key. Note that multiple clusters may write to the same HBase cluster... - As for createTables(), I'm also of the opinion that it might be better if we moved it to a dedicated creator class. Again, in a context of mapreduce job, it would mean that multiple mappers would compete to create this table. The concurrency will be sorted out by phoenix, but it doesn't seem very necessary. Not only the first time, but every time the aggregation job runs, it would do the "create if not exists..." work unnecessarily. Also, note that dropTables() can only be accessed by a separate java main process anyway. So it might be good to have a separate class that can let you create or drop tables explicitly. I don't know if it should be part of the existing schema creator or a separate one, and I can see pros and cons of either option. - l.156: My JDBC knowledge is bit outdated, but do you want to prepare the statement every time write is done? Don't you want to prepare it once and reuse it? That optimization will follow later? (OfflineAggregationInfo.java) - l.52-54: I would enforce the notion that this is a read-only object by making the members final > Refactor timelineservice.storage to add support to online and offline > aggregation writers > ----------------------------------------------------------------------------------------- > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Li Lu > Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch, > YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, > YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, > YARN-3904-YARN-2928.006.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. In this > JIRA, I'm proposing to refactor writers to add support to aggregation > writers. Offline aggregation writers typically has less contextual > information. We can distinguish these writers by special naming. We can also > use CollectorContexts to model all contextual information and use it in our > writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)