Sangjin Lee commented on YARN-3901:

(1) FlowScanner cell order issue
When I added the reader code and started testing it against the flow run unit 
tests, I found that reading the END_TIME column on the flow run table didn't 
work. The flow run column read on END_TIME is essentially 
{{Result.getValue()}}. However, HBase was failing to find the END_TIME column 
although it clearly existed in the result. It was basically failing at the 
binary search:

  public Cell getColumnLatestCell(byte [] family, byte [] qualifier) {
    Cell [] kvs = rawCells(); // side effect possibly.
    if (kvs == null || kvs.length == 0) {
      return null;
    int pos = binarySearch(kvs, family, qualifier);
    if (pos == -1) {
      return null;
    if (CellUtil.matchingColumn(kvs[pos], family, qualifier)) {
      return kvs[pos];
    return null;

The binary search was failing because the cells in the result were stored in 
the wrong order.

The cells were stored in the wrong order because it was being added by our 
co-processor (in FlowScanner.nextInternal()).

189         if (runningSum.size() > 0) {
190           for (Map.Entry<String, Long> newCellSum : runningSum.entrySet()) {
191             // create a new cell that represents the flow metric
192             Cell c = newCell(metricCell.get(newCellSum.getKey()),
193                 newCellSum.getValue());
194             cells.add(c);
195           }
196         }
197         if (currentMinCell != null) {
198           cells.add(currentMinCell);
199         }
200         if (currentMaxCell != null) {
201           cells.add(currentMaxCell);
202         }

And this order is preserved all the way to the reader. The fix is to add the 
cells in the right order via KeyValueComparator. This fix is included in my 
patch on YARN-4074. This will be fixed in this JIRA.

> Populate flow run data in the flow_run & flow activity tables
> -------------------------------------------------------------
>                 Key: YARN-3901
>                 URL: https://issues.apache.org/jira/browse/YARN-3901
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>         Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, 
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay

This message was sent by Atlassian JIRA

Reply via email to