[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554595#comment-14554595 ]
Joep Rottinghuis commented on YARN-3411: ---------------------------------------- I think it is reasonable that two implementations can differ in their backing schema as long as they both can write the data and retrieve the data with the same key information. Phoenix may need to add somethings to the rowkey in order to work properly, it may have to add some things, and ditto for the raw HBase implementation, some additional secondary lookups may be needed etc. That is part of the performance comparison to see. [~djp] with respect to adding the flow version in the key, I think the problem with that is that you now require the caller to know what the version is in order to query back. I don't think that is a natural requirement. I know that I ran the "ComputeUniqueUsers" flow on the cluster, so I have user cluster and flowname, but I don't need to know the version to just query the last few runs right? If you do have the version (for reducer estimation and you want the last runs of the same flow back) then it should be possible to query by flow _and_ by version, but I don't think it should be mandatory. Therefore I don't think that flow version must perse be a rowkey in all implementations. I think we'll find that with certain schema choices some things will be more performant while others will be somewhat slower. It will be a mater of finding those schema choices that will give good enough write performance to handle scale and give good read performance for the most common use cases, while maintaining reasonable performance for other queries. > [Storage implementation] explore the native HBase write schema for storage > -------------------------------------------------------------------------- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Vrushali C > Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, > YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, > YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, > YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, > YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, > YARN-3411.poc.7.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)