[
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725845#comment-14725845
]
Sangjin Lee commented on YARN-3901:
-----------------------------------
Yeah, I can basically reproduce this issue too. What you said is largely
correct. If you try to read the flow run metrics via
{{readResultsWithTimestamps()}}, it will choke on
{{GenericObjectMapper.read()}}. This is what I see:
{noformat}
org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code
0)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: [B@19fcfd96; line: 1, column: 2]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
at
org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
at
org.codehaus.jackson.impl.JsonParserMinimalBase._throwInvalidSpace(JsonParserMinimalBase.java:467)
at
org.codehaus.jackson.impl.ReaderBasedParser._skipWSOrEnd(ReaderBasedParser.java:1491)
at
org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:368)
at
org.codehaus.jackson.map.ObjectReader._initForReading(ObjectReader.java:828)
at
org.codehaus.jackson.map.ObjectReader._bindAndClose(ObjectReader.java:752)
at
org.codehaus.jackson.map.ObjectReader.readValue(ObjectReader.java:486)
at
org.apache.hadoop.yarn.server.timeline.GenericObjectMapper.read(GenericObjectMapper.java:93)
at
org.apache.hadoop.yarn.server.timeline.GenericObjectMapper.read(GenericObjectMapper.java:77)
at
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:202)
at
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:155)
at
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.readMetrics(HBaseTimelineReaderImpl.java:513)
at
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.readFlowRunEntity(HBaseTimelineReaderImpl.java:669)
at
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getFlowRunEntity(HBaseTimelineReaderImpl.java:625)
at
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:122)
{noformat}
Is that what you see, or is it different?
At any rate, the data is being written different that the GenericObjectMapper
cannot parse it, it seems.
> Populate flow run data in the flow_run table
> --------------------------------------------
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Vrushali C
> Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch,
> YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table.
> Some points that are being considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for
> the applicationId. A coprocessor will return the min value of all written
> values. -
> - Upon flush and compactions, the min value between all the cells of this
> column will be written to the cell without any tag (empty tag) and all the
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can
> indicate running (1) or complete (2). In those cases (for metrics) only
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow
> numbers are retained in a separate column for historical tracking: we don’t
> want to re-aggregate for those upon replay
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)