[jira] [Created] (YARN-3901) Populate flow run data in the flow_run table

Vrushali C (JIRA) Wed, 08 Jul 2015 14:43:19 -0700

Vrushali C created YARN-3901:
--------------------------------

             Summary: Populate flow run data in the flow_run table
                 Key: YARN-3901
                 URL: https://issues.apache.org/jira/browse/YARN-3901
             Project: Hadoop YARN
          Issue Type: Sub-task
            Reporter: Vrushali C
            Assignee: Vrushali C

As per the schema proposed in YARN-3815 in
https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf

filing jira to track creation and population of data in the flow run table.

Some points that are being considered:
- Stores per flow run information aggregated across applications, flow version
RM’s collector writes to on app creation and app completion
- Per App collector writes to it for metric updates at a slower frequency than
the metric updates to application table
primary key: cluster ! user ! flow ! flow run id
- Only the latest version of flow-level aggregated metrics will be kept, even
if the entity and application level keep a timeseries.
- The running_apps column will be incremented on app creation, and decremented
on app completion.
- For min_start_time the RM writer will simply write a value with the tag for
the applicationId. A coprocessor will return the min value of all written
values. -
- Upon flush and compactions, the min value between all the cells of this
column will be written to the cell without any tag (empty tag) and all the
other cells will be discarded.
- Ditto for the max_end_time, but then the max will be kept.
- Tags are represented as #type:value. The type can be not set (0), or can
indicate running (1) or complete (2). In those cases (for metrics) only
complete app metrics are collapsed on compaction.
- The m! values are aggregated (summed) upon read. Only when applications are
completed (indicated by tag type 2) can the values be collapsed.
- The application ids that have completed and been aggregated into the flow
numbers are retained in a separate column for historical tracking: we don’t
want to re-aggregate for those upon replay

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3901) Populate flow run data in the flow_run table

Reply via email to