[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517347#comment-14517347
 ] 

Vrushali C commented on YARN-3411:
----------------------------------

Hi [~gtCarrera9]

Thanks for the feedback! 
bq. About null checks: so far we do not have a fixed standard on if and where 
we need to do null checks. I noticed you assumed info, config, event, and other 
similar fields are not null. Maybe we'd like to explicitly decide when all 
those fields can be null or empty.
Yes, I think it is a very good idea to always do null checks in the writer. Let 
me update my patch with null checks, thanks!

bq. Maybe we'd like to change TimelineWriterUtils to default access modifier? I 
think it would be sufficient to make it visible in package?
Okay, will change the modifier in the next patch. 

bq. One thing I'd like to open a discussion is on deciding the way to store and 
process metrics. Currently, in the hbase patch, startTime and endTime are not 
used. In the Phoenix patch, I store time series as a flattened, non-queryable 
strings. I think this part also requires some hint from the time-based 
aggregation
Hmm, so I was actually confused about the startTime and endtTime members in 
that class but let me see if I can catch you offline to understand what they 
stand for. I would like to ensure we have all the information represented in 
the backend.


bq. Another thing I'd like to discuss here is if and how we'd like to set up a 
separate "fast path" for metric only updates. On the storage layer, I'd 
strongly +1 for a separate fast path such that we can only touch the 
(frequently updated) metrics table. Any proposals everyone?
Yes, for the fast path, how about this (I think we discussed it briefly but 
haven't noted it): the caller will create a small timeline entity object with 
all other fields as null/empty. In that case, no other writes except the 
metrics update need to be executed. Also, the metrics object will be 
lightweight in the sense it will have only that latest timestamp and value in 
it. How does that sound? 


> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to