Junping Du commented on YARN-3411:

Hi [~vrushalic] and all, there is one question I had for a long time:
Looks like we are updating time series metrics data as versions of metrics cell 
instead of inserting of a row (with time serve as a column). Do we have an 
estimation for performance cost for aggregation metrics data over cell versions 
vs. rows? The original usecase for cell versions coming from BigTable is to 
store web pages which could only be limited number of versions (too stale 
versions could be dropped). In our case, the metrics data could be updated much 
more frequently and we cannot drop the earliest data. I see we are setting max 
version to be 200, in 
 It sounds like it may not be enough for cases, like: long running services, 
container reuse, etc., this even sounds challenge for normal container (assume 
appCollector doesn't cache metrics data locally, the interval default is: 1 

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
> YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
> YARN-3411.poc.txt
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.

This message was sent by Atlassian JIRA

Reply via email to