James Taylor commented on YARN-2928:
Happy to help, [~gtCarrera9]. Thanks for the information.
bq. If I understand this correctly, in this case, Phoenix will inherit
pre-split settings from HBase? Will this alter the existing HBase table,
including its schema and/or data inside? In general, if one runs CREATE TABLE
IF NOT EXISTS or simply CREATE TABLE commands over a pre-split existing HBase
table, will Phoenix simply accept the existing table as-is?
If you create a table in Phoenix and the table already exists in HBase, Phoenix
will accept the existing table as-is, adding any metadata it needs (i.e. it's
coprocessors). If the table has existing data, then Phoenix will add an empty
KeyValue to each row in the first column family referenced in the create table
statement (or the default column family if there are no column families
referenced). Phoenix needs this empty key value for a variety of reasons. The
onus is on the user to ensure that the types in the create table statement
match the actual means in which the data was serialized.
For your configuration/metric key-value pair, how are they named? Do you know
the possible set of key values in advance? Or are they known more-or-less
on-the-fly? One way you could model this with views is to just dynamically add
the column to the view when you need to. Adding a column to a view is a very
light weight operation - corresponding to a few Puts to the SYSTEM.CATALOG
table. Then you'd have a way of looping through all metrics for a given view
using the metadata APIs. Think of a view as a set of explicitly named dynamic
columns. You'd still need to generate the SQL statement, though.
bq. One potential solution is to use HBase coprocessors to aggregate
application data from the HBase storage, and then store them in a Phoenix
I'm not following. Are you thinking to have a secondary table that's a rollup
aggregation of more raw data? Is that required, or is it more of a convenience
for the user? If the raw data is Phoenix-queryable, then I think you have a lot
of options. Can you point me to some more info on your design?
The stable APIs for Phoenix are the ones we expose through our public APIs:
JDBC and our various integration modules (i.e. MapReduce, Pig, etc.). I'd say
that our serialization format produced by PDataType is stable (it needs to be
for us to meet our b/w compat guarantees) and the PDataType APIs are more
stable than others. Also, we're looking to integrate with Apache Calcite, so we
may have some other APIs that could be hooked into as well down the road.
> YARN Timeline Service: Next generation
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx,
> We have the application timeline server implemented in yarn per YARN-1530 and
> YARN-321. Although it is a great feature, we have recognized several critical
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those.
> This is phase 1 of this effort.
This message was sent by Atlassian JIRA