[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

Li Lu (JIRA) Mon, 08 Jun 2015 14:35:35 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577885#comment-14577885
 ]


Li Lu commented on YARN-2928:
-----------------------------

Hi [~jamestaylor], thank you very much for your great help! Some clarifications 
on my questions...

bq. For your configuration/metric key-value pair, how are they named? Do you 
know the possible set of key values in advance? Or are they known more-or-less 
on-the-fly? 

For our use case they're completely on-the-fly. For each timeline entity, we 
plan to store each of its configuration/metric in one dynamic column. It is 
possible that different entities may have completely different configs/metrics. 
For example, a mapreduce job may have a completely different set of configs to 
a tez job. Therefore, we need to generate all columns for configs/metrics 
dynamically. I'm wondering that, when adding the dynamic columns into a view, 
do I still need to explicitly claim those dynamic columns (I assume yes but 
would like to double check)? 

bq. Are you thinking to have a secondary table that's a rollup aggregation of 
more raw data? Is that required, or is it more of a convenience for the user? 
If the raw data is Phoenix-queryable, then I think you have a lot of options. 
Can you point me to some more info on your design?

Yes, we are considering to have multiple levels of aggregation tables, each 
with a different granularity. For example, now we're planning to do the first 
level (application level) aggregation from an HBase table to a Phoenix table. 
Then, we can aggregate flow level information based on our application level 
aggregation (since each application belongs to and only belongs to one flow). 
In this way, we can temporarily get rid of the write throughput limitation of 
Phoenix, but still support SQL queries on aggregated data. If the Phoenix 
PDataTypes are stable, then is it possible for us to do the following two 
things? 
# Use HBase API and PDataTypes to read a Phoenix table, and read dynamic 
columns iteratively. 
# Use HBase API and PDataTypes to write a Phoenix table, and write dynamic 
columns iteratively. 

> YARN Timeline Service: Next generation
> --------------------------------------
>
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation

Reply via email to