Zhijie Shen commented on YARN-3134:

Li, thanks for uploading the POC patch. I put some of my thoughts:

1. One entity may need multiple sql sentences to complete one entity write. Do 
we need to use transaction? If not (or phoenix doesn't support it), how do we 
handle the case if the first sql sentence completes, but the second doesn't. 
There will be just partial data. On the other hand, if we use the transaction, 
is it going to significantly downgrade the write throughput.

2. From YARN-3448, Jonathan suggested one performance improvement for Leveldb 
implementation: sequential writes may be quicker than random writes, such that 
we can reorder the records to persist to make them as sequential as possible. 
In this case, is it better to write the entity one-by-one (including config, 
info), assuming the records are in sequence by PK?

3. To answer the offline question of writing multiple metrics, my thought is:
a) sync write: no matter if the user wrap a single metric or multiple metrics, 
we synchronously write it into the backend. 
b) async write: server can respond the client immediately, but it can buffer 
the entity into the queue, and later on, we merge these metrics together (not 
limited to metrics, but even whole entity), and asynchronously write them into 
the backend.

4. Do we have a simple deployment, which means by default the backend will 
start a HBase on local FS, and have the phoenix lib installed? It's related to 
the question what's the default backend of timeline service. If it were the 
single node HBase on local FS, we should make sure it is automatically deployed 
with default configs. Thoughts?

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134DataSchema.pdf
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.

This message was sent by Atlassian JIRA

Reply via email to