[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746239#comment-14746239
 ] 

Li Lu commented on YARN-3901:
-----------------------------

Hi [~vrushalic], thanks for the work! I feel the current patch is much closer. 
Some of my review comments:

- Column.java
Let's make it clear in the comments if attributes can be null. (I believe it 
can be null but let's make it explicit. )

- TimelineWriterUtils.java
Quick question: when checking application finish event and its timestamp, we 
only search for the very last event event. Will there be some corner cases 
where the application finish event is not the last one? Such as, another event 
accidentally has the same timestamp as the finish event? I'm not sure if we 
have any strong guarantee on this (so that the chance is low). 

- TimestampGenerator.java
The lock-free algorithm to generate (locally-) unique timestamps looks good to 
me. Depend on the usage maybe we'd like to promote this algorithm to a more 
common place if other part of the code needs it as well. Any special 
considerations on not directly using getAndAdd (fetch-and-increment) here? 

- TimelineWriterUtils.java
TimelineWriterUtils#getTagFromAttribute is much better than the previous 
versions. It may be better to let the key of attribute entry to identify its 
own type? Right now we're requiring AggregationOperations and 
AggregationCompactionDimension to have disjoint enum values. This implicit 
requirement may be error pruning: if two attributes accidentally have DEFAULT, 
we're confusing them. Not sure if this is the top priority for now but we may 
need to fix it sometime later. 

- FlowScanner.java
Unify findMaxCell and findMinCell? 
I'm a little bit confused on SUM and SUM_FINAL: what's the difference (we treat 
them in the same way in collectCells)? 
Right now we treat all metric values to be Long. Let's decide that later. 

- FlowRunTable.java
We're mixing real "info" with metrics (info.m!xxxxx) together in the info 
column family. I think this is worth noting somewhere? 

- TestHBaseStorageFlowRunFlowActivity.java
Split into test flowRun and test flowActivity? Although added in the same JIRA, 
the function of the flowRun table and flowActivity tables are different (online 
aggregation and real-time activity info). 
It's fine to directly read data out in the UT now. We may want to switch this 
to use readers sometime later?
Do we need to test single data aggregation? The two {{getEntityMetricsApp}} 
methods only generates time series metrics. 
checkFlowRunTable(), shall we have some better ways to check the values 141 and 
57? Say, is it possible to check them dynamically rather than hard code them? 

- With regard to FlowActivityTable, maybe I'm still missing some points here, 
but how do we maintain the active flow list for the past 24 hrs? I noticed we 
never remove or disable anything in this table, so do we do this by setting the 
ttl of the table? Or we create different rows for the same application to 
differentiate the day an entity was posted? I think we're using the latter but 
would like to confirm. Also, with the current design, when the reader tries to 
get latest activities on a cluster, it can craft a row key prefix 
{{cluster!inv(currTime-24h)}} and scan from the very beginning to this record, 
right? (I'm trying to connect the two pieces together. )

> Populate flow run data in the flow_run & flow activity tables
> -------------------------------------------------------------
>
>                 Key: YARN-3901
>                 URL: https://issues.apache.org/jira/browse/YARN-3901
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>         Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, 
> YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch, 
> YARN-3901-YARN-2928.8.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to