Zhijie Shen commented on YARN-3134:

Some thoughts about backend POC, not just limited to Phoenix writer, but HBase 
writer too.

1. At the current stage, I suggest we focus on logic correctness and 
performance tuning. We may have multiple iterations between improving and doing 

2. At the beginning, we may not implement storing everything of timeline entity 
(such as relationship), but we should at lease make sure what Phoenix writer 
and HBase writer have implemented are identical in terms of the data to store.

3. It's good if we can have rich test suites like TimelineStoreTestUtils to 
ensure the robustness of the writer. Moreover, it's black box testing, and we 
can use them to check if Phoenix writer and HBase writer behave the same.

/cc [~vrushalic]

For Phoenix implementation only:

I used Phoenix writer for a real deployment, and I could see the implementation 
is not thread safe. ConcurrentModificatioException will be thrown upon 
committing the statements.

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.

This message was sent by Atlassian JIRA

Reply via email to