Li Lu commented on YARN-3134:

Hi [~vrushalic] and [~zjshen]! Thanks for the comments! 

About [~vrushalic]'s questions, for this poc patch I'm not adding metrics info, 
but that's my next step. I'm storing configs in the entity table, under a 
separate column family CONFIG_COLUMN_FAMILY. Each config item C(k, v) for a PK 
will be stored at column CONFIG_COLUMN_FAMILY.k, row PK with value v. 

bq. One entity may need multiple sql sentences to complete one entity write. Do 
we need to use transaction?
That's a very good question that I'm not sure about the answer right now. Now 
we're writing one entity (with a PK) with two writes, one only with static 
columns (C_s) and the other only with dynamic columns (C_d). Hbase will 
guarantee row level atomicity for each of the write, so I assume the result 
after the two calls will be (PK, C_s) or (PK, C_d) or (PK, C_s, C_d). The last 
one is the best case of course. 

bq. In this case, is it better to write the entity one-by-one (including 
config, info), assuming the records are in sequence by PK?
I think you're right. Will look into this improvement. 

About deployment, for end users we can either feed then a predefined version of 
phoenix+hbase for simpler deployment, or we can allow users to specify the 
classpath for the phoenix JDBC driver and choose a version of phoenix+hbase in 
a customized way. The latter will unavoidably introduce some difficulties to 
deployment, but with more freedom. For now, I think our short-term focus is to 
wrap miniclusters to allow UTs pass in our branch (to be prepared for a branch 

About posting metrics, I was thinking if it's possible to allow users just send 
the delta to storage, and we can use some information in the timeline entity to 
infer if the entity itself is already in the entity table? If that's possible 
then we can have some shortcut (not touching entity table) for faster metrics 
updating, which may generate the majority of our storage traffic. 

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134DataSchema.pdf
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.

This message was sent by Atlassian JIRA

Reply via email to