[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735887#comment-14735887
 ] 

Joep Rottinghuis commented on YARN-3901:
----------------------------------------

Thanks [~vrushalic]. I'm going to dig through the details on the latest patch.
Separately [~sjlee0] and I further discussed the challenges of taking the 
timestamp on the coprocessor, buffering writes, app restarts, timestamp 
collisions and ordering of various writes that come on.

1) Given that we have timestamps in # millis, then multiplying by 1,000 should 
suffice. It is unlikely that we'd have > 1M writes for one column in one region 
server for one flow. If we multiply by 1M we get close to the total date range 
that can fit in a long (still years to come, but still).

2) If we do any shifting of time, we should do the same everywhere to keep 
things consistent, and to keep the ability to ask what a particular row 
(roughly) looked like at any particular time (like last night midnight, what 
was the state of this entire row).

3) We think in the column helper, if the ATS client supplies a timestamp, we 
should multiply by 1,000. If we read any timestamp from HBase, we'll divide by 
1,000.

4) If the ATS client doesn't supply the timestamp, we'll grab the timestamp in 
the ats writer the moment the write arrives (and before it is batched / 
buffered in the buffered mutator, HBase client, or RS queue). We then take this 
time and multiply by 1,000. Reads again divide by 1,000 to get back to millis 
in epoch as before.

5) For Agg operation SUM, MIN, and MAX we take the least significant 3 digits 
of the app_id and add this to the (timestamp*1000), so that we create a unique 
timestamp per app in an active flow-run. This should avoid any collisions.
This takes care of uniqueness (no collisions on a single ms), but also solves 
for older instances of a writer (in case of a second AM attempt for example) or 
any other kind of ordering issue. The write are timestamped when they arrive at 
the writer.

6) If some piece of client code doesn't set any timestamp (this should be an 
error) then we cannot effectively order the writes as per the previous point. 
We still need to ensure that we don't have collisions. If the client supplied 
timestamp if LONG.Maxvalue, then we can generate the timestamp in the 
coprocessor on the servers side, modulo the counter to ensure uniqueness. We 
should still multiply by 1K to make the same amount of space for the unique 
counter.

> Populate flow run data in the flow_run & flow activity tables
> -------------------------------------------------------------
>
>                 Key: YARN-3901
>                 URL: https://issues.apache.org/jira/browse/YARN-3901
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>         Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to