[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554595#comment-14554595
 ] 

Joep Rottinghuis commented on YARN-3411:
----------------------------------------

I think it is reasonable that two implementations can differ in their backing 
schema as long as they both can write the data and retrieve the data with the 
same key information. Phoenix may need to add somethings to the rowkey in order 
to work properly, it may have to add some things, and ditto for the raw HBase 
implementation, some additional secondary lookups may be needed etc. That is 
part of the performance comparison to see.

[~djp] with respect to adding the flow version in the key, I think the problem 
with that is that you now require the caller to know what the version is in 
order to query back. I don't think that is a natural requirement. I know that I 
ran the "ComputeUniqueUsers" flow on the cluster, so I have user cluster and 
flowname, but I don't need to know the version to just query the last few runs 
right? If you do have the version (for reducer estimation and you want the last 
runs of the same flow back) then it should be possible to query by flow _and_ 
by version, but I don't think it should be mandatory.
Therefore I don't think that flow version must perse be a rowkey in all 
implementations.

I think we'll find that with certain schema choices some things will be more 
performant while others will be somewhat slower. It will be a mater of finding 
those schema choices that will give good enough write performance to handle 
scale and give good read performance for the most common use cases, while 
maintaining reasonable performance for other queries.

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
> YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
> YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to