Sangjin Lee commented on YARN-4074:

In TimelineEntityReader#readMetrics it seems safe to assume that if we have 
more than one value that this is a TimelineMetric.Type.TIME_SERIES.
Conversely it doesn't have to be true though right? I guess we'll just assume 
that for timelines we'd never have just one value? I can't quite oversee the 
impact of incorrectly assuming TimelineMetric.Type.SINGLE_VALUE if only one 
value has been written to HBase yet.

That's right. We discussed this some time ago, and we think it'd be safer if 
the metric type (single value vs. time series) were stored/persisted. But there 
are other dimensions of metrics we may need to store (e.g. long vs. float, 
whether to aggregate, etc.). Also, there is a question of what if users wrote 
inconsistent data. So, at that time we went with a simple decision that's 
currently there (the code you see in {{TimelineEntityReader}} is refactored out 
of {{HBaseTimelineReaderImpl}} so it's not new code).

We should come to a conclusion on how to store/encode various dimensions of 
metrics, but not as part of this JIRA.

Wrt. ApplicationRowKey: at some point (perhaps not this jira) we should 
consider making the app_id a compound object that is stored with a ? separator. 
The prefix (in most cases in yarn right now would be "application_") would be 
separate and the RM start time and the final numeric part would be stored as a 
numerical value with a separate Bytes.to... conversion.

Otherwise we'll end up getting incorrect order for rowkeys when the application 
id wraps to 10K and each power of ten after that. For example, lexically 
application_1442351767756_10000 < application_1442351767756_9999

If we just access the application by specific key this doesn't matter, but if 
we do a row-scan and count on ordering to set an appropriate stop on the scan, 
we'll break things.
This happens on all rowkeys with the app_id in it.

That's a good point. We need to fix this, or we'll have incorrect 
orders/results happening with queries. This impacts anywhere we rely on the app 
id order (as string). I'll file a separate JIRA to address this issue.

> [timeline reader] implement support for querying for flows and flow runs
> ------------------------------------------------------------------------
>                 Key: YARN-4074
>                 URL: https://issues.apache.org/jira/browse/YARN-4074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, 
> YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, 
> YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.

This message was sent by Atlassian JIRA

Reply via email to