[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551018#comment-14551018
]
Zhijie Shen edited comment on YARN-3411 at 5/19/15 7:00 PM:
------------------------------------------------------------
bq. A flow can be uniquely identified with the flow name and run id (and of
course cluster and user id).
I think in Phoenix impl, we have treated version as part of identifier of a
unique flow.
bq. Hmm, so schema creation happens more or less once in the lifetime of the
hbase cluster like during cluster setup (or perhaps if we decide to drop and
recreate it, which is rare in production). I believe writers will come to life
and cease to exist with each yarn application lifecycle but cluster is more or
less eternal, so adding to this step to the lifecycle of a Writer Impl object
seems somewhat out of place to me.
Fair point. And this is another place different from Phoenix impl, which
creates table if they don't exist. My perspective is more about automation, and
it's better to leave fewer steps for users to setup the service. Perhaps we can
find somewhere else instead of multiple, distributed writer to invoke the table
initialization once if the service is setup for YARN cluster, and HBase/Phoenix
is used as the backend.
was (Author: zjshen):
bq. A flow can be uniquely identified with the flow name and run id (and of
course cluster and user id).
I think in Phoenix impl, we have treated version as part of identifier of a
unique flow.
bq. Hmm, so schema creation happens more or less once in the lifetime of the
hbase cluster like during cluster setup (or perhaps if we decide to drop and
recreate it, which is rare in production). I believe writers will come to life
and cease to exist with each yarn application lifecycle but cluster is more or
less eternal, so adding to this step to the lifecycle of a Writer Impl object
seems somewhat out of place to me.
Fair point. And this is another place different from Phoenix impl, which
creates table if they don't exist. My perspective is more about automation, and
it's better to leave fewer steps for users to setup the service. Perhaps we can
find somewhere else to invoke the table initialization once if the service is
setup for YARN cluster, and HBase/Phoenix is used as the backend.
> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Vrushali C
> Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
> YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
> YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
> YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native
> HBase schema for the write path. Such a schema does not exclude using
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in
> terms of performance, scalability, usability, etc. and make a call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)