[
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289207#comment-15289207
]
Sangjin Lee commented on YARN-5109:
-----------------------------------
Thanks for looking into this [~varun_saxena]. It might be good to write a small
unit test for this. FYI, the offending timestamp in the above example is
1463437148774.
I think the scope of this issue is rather big unfortunately. If I'm not
mistaken, any time a byte array is passed into {{Separator.join()}} without
encoding we need to encode it. Of course it also means we need to decode it on
the way out.
For example, {{ColumnHelper.getColumnQualifier()}} accepts both
{{columnPrefixBytes}} and {{qualifier}} as is and joins them.
I can work with you to identify all the places where this is a problem and also
the solution. Let's discuss it here on the JIRA.
> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
> Key: YARN-5109
> URL: https://issues.apache.org/jira/browse/YARN-5109
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: YARN-2928
> Reporter: Sangjin Lee
> Assignee: Varun Saxena
> Priority: Blocker
> Labels: yarn-2928-1st-milestone
>
> When we store timestamps (for example as part of the row key or part of the
> column name for an event), the bytes are used as is without any encoding. If
> the byte value happens to contain a separator character we use (e.g. "!" or
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
> incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event)
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]