[jira] [Comment Edited] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

Zhijie Shen (JIRA) Tue, 19 May 2015 12:01:14 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551018#comment-14551018
 ]


Zhijie Shen edited comment on YARN-3411 at 5/19/15 7:00 PM:
------------------------------------------------------------

bq. A flow can be uniquely identified with the flow name and run id (and of 
course cluster and user id).

I think in Phoenix impl, we have treated version as part of identifier of a 
unique flow.

bq. Hmm, so schema creation happens more or less once in the lifetime of the 
hbase cluster like during cluster setup (or perhaps if we decide to drop and 
recreate it, which is rare in production). I believe writers will come to life 
and cease to exist with each yarn application lifecycle but cluster is more or 
less eternal, so adding to this step to the lifecycle of a Writer Impl object 
seems somewhat out of place to me.

Fair point. And this is another place different from Phoenix impl, which 
creates table if they don't exist. My perspective is more about automation, and 
it's better to leave fewer steps for users to setup the service. Perhaps we can 
find somewhere else instead of multiple, distributed writer to invoke the table 
initialization once if the service is setup for YARN cluster, and HBase/Phoenix 
is used as the backend.


was (Author: zjshen):
bq. A flow can be uniquely identified with the flow name and run id (and of 
course cluster and user id).

I think in Phoenix impl, we have treated version as part of identifier of a 
unique flow.

bq. Hmm, so schema creation happens more or less once in the lifetime of the 
hbase cluster like during cluster setup (or perhaps if we decide to drop and 
recreate it, which is rare in production). I believe writers will come to life 
and cease to exist with each yarn application lifecycle but cluster is more or 
less eternal, so adding to this step to the lifecycle of a Writer Impl object 
seems somewhat out of place to me.

Fair point. And this is another place different from Phoenix impl, which 
creates table if they don't exist. My perspective is more about automation, and 
it's better to leave fewer steps for users to setup the service. Perhaps we can 
find somewhere else to invoke the table initialization once if the service is 
setup for YARN cluster, and HBase/Phoenix is used as the backend.

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
> YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
> YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

Reply via email to