[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

Sangjin Lee (JIRA) Mon, 03 Aug 2015 12:01:34 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652280#comment-14652280
 ]


Sangjin Lee commented on YARN-3904:
-----------------------------------

Sorry [~gtCarrera9] it took me a while to catch up with this. Thanks for the 
updated patch!

(OfflineAggregationWriter.java)
- Actually I'd like to ask whether this needs to be a service. :) Note that it 
is possible (or likely) that the writer will be executed in a mapreduce task. 
If this is executed within a mapper task, what does it mean for it be a 
service? It would not add much value, and at minimum the 
initialization/start/stop should be possbile outside the service framework, 
right?

(PhoenixOfflineAggregationWriterImpl.java)
- Should the primary key user and then cluster, or cluster and user? I think it 
might be better if it is cluster and user although it is different than the 
entity table. [~vrushalic]?
- For the user aggregation tables, I believe the cluster needs to be included 
in the row key. Note that multiple clusters may write to the same HBase 
cluster...
- As for createTables(), I'm also of the opinion that it might be better if we 
moved it to a dedicated creator class. Again, in a context of mapreduce job, it 
would mean that multiple mappers would compete to create this table. The 
concurrency will be sorted out by phoenix, but it doesn't seem very necessary. 
Not only the first time, but every time the aggregation job runs, it would do 
the "create if not exists..." work unnecessarily. Also, note that dropTables() 
can only be accessed by a separate java main process anyway. So it might be 
good to have a separate class that can let you create or drop tables 
explicitly. I don't know if it should be part of the existing schema creator or 
a separate one, and I can see pros and cons of either option.
- l.156: My JDBC knowledge is bit outdated, but do you want to prepare the 
statement every time write is done? Don't you want to prepare it once and reuse 
it? That optimization will follow later?

(OfflineAggregationInfo.java)
- l.52-54: I would enforce the notion that this is a read-only object by making 
the members final

> Refactor timelineservice.storage to add support to online and offline 
> aggregation writers
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-3904
>                 URL: https://issues.apache.org/jira/browse/YARN-3904
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-3904-YARN-2928.001.patch, 
> YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
> YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
> YARN-3904-YARN-2928.006.patch
>
>
> After we finished the design for time-based aggregation, we can adopt our 
> existing Phoenix storage into the storage of the aggregated data. In this 
> JIRA, I'm proposing to refactor writers to add support to aggregation 
> writers. Offline aggregation writers typically has less contextual 
> information. We can distinguish these writers by special naming. We can also 
> use CollectorContexts to model all contextual information and use it in our 
> writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

Reply via email to