Vrushali C commented on YARN-3411:

Hi [~zjshen]

Thanks for the review! 
bq I saw in HBase implementation flow version is not included as part of row 
key. This is a bit different from primary key design of Phoenix implementation. 
Would you mind elaborating your rationale a bit?

Yes, I think the flow version need not be part of the primary key. A flow can 
be uniquely identified with the flow name and run id (and of course cluster and 
user id). Given a run id, we can determine the version. For production jobs, 
the version does not change, so we would be repeating it across runs. I haven’t 
looked into the Phoenix schema to understand why it is needed on the Phoenix 
side. cc [~gtCarrera9]

bq Shall we make the constants in TimelineEntitySchemaConstants follow Hadoop 
convention? We can keep them in this class now. Once we decide to move on with 
HBase impl, we should move (some of) them into YarnConfiguration as API.

Yes, I did not add them to YarnConfiguration as API since I figured it may be 
cleaner to keep this code contained within timelineservice.storage to help 
remove it if needed.  But will rename them as per Hadoop convention.

bq. In fact, you can leave these classes not annotated.
I see, I had added the annotations for these classes after some of the review 
suggestions, I think from @sjlee. 

bq. According to TimelineSchemaCreator, we need to run command line to create 
the table when we setup the backend, right? Can we include creating the table 
into the lifecycle of HBaseTimelineWriterImpl?

Hmm, so schema creation happens more or less once in the lifetime of the hbase 
cluster like during cluster setup (or perhaps if we decide to drop and recreate 
it, which is rare in production). I believe writers will come to life and cease 
to exist with each yarn application lifecycle but cluster is more or less 
eternal, so adding to this step to the lifecycle of a Writer Impl object seems 
somewhat out of place to me.

Appreciate the review!

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
> YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
> YARN-3411.poc.txt
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.

This message was sent by Atlassian JIRA

Reply via email to