[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575490#comment-14575490
 ] 

James Taylor commented on YARN-2928:
------------------------------------

Happy to help, [~gtCarrera9]. Thanks for the information.

bq. If I understand this correctly, in this case, Phoenix will inherit 
pre-split settings from HBase? Will this alter the existing HBase table, 
including its schema and/or data inside? In general, if one runs CREATE TABLE 
IF NOT EXISTS or simply CREATE TABLE commands over a pre-split existing HBase 
table, will Phoenix simply accept the existing table as-is?
If you create a table in Phoenix and the table already exists in HBase, Phoenix 
will accept the existing table as-is, adding any metadata it needs (i.e. it's 
coprocessors). If the table has existing data, then Phoenix will add an empty 
KeyValue to each row in the first column family referenced in the create table 
statement (or the default column family if there are no column families 
referenced). Phoenix needs this empty key value for a variety of reasons. The 
onus is on the user to ensure that the types in the create table statement 
match the actual means in which the data was serialized.

For your configuration/metric key-value pair, how are they named? Do you know 
the possible set of key values in advance? Or are they known more-or-less 
on-the-fly? One way you could model this with views is to just dynamically add 
the column to the view when you need to. Adding a column to a view is a very 
light weight operation - corresponding to a few Puts to the SYSTEM.CATALOG 
table. Then you'd have a way of looping through all metrics for a given view 
using the metadata APIs. Think of a view as a set of explicitly named dynamic 
columns. You'd still need to generate the SQL statement, though.

bq. One potential solution is to use HBase coprocessors to aggregate 
application data from the HBase storage, and then store them in a Phoenix 
aggregation table.
I'm not following. Are you thinking to have a secondary table that's a rollup 
aggregation of more raw data? Is that required, or is it more of a convenience 
for the user? If the raw data is Phoenix-queryable, then I think you have a lot 
of options. Can you point me to some more info on your design?

The stable APIs for Phoenix are the ones we expose through our public APIs: 
JDBC and our various integration modules (i.e. MapReduce, Pig, etc.). I'd say 
that our serialization format produced by PDataType is stable (it needs to be 
for us to meet our b/w compat guarantees) and the PDataType APIs are more 
stable than others. Also, we're looking to integrate with Apache Calcite, so we 
may have some other APIs that could be hooked into as well down the road.


> YARN Timeline Service: Next generation
> --------------------------------------
>
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx, 
> TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to