[
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302345#comment-15302345
]
Joep Rottinghuis edited comment on YARN-5109 at 5/26/16 4:30 PM:
-----------------------------------------------------------------
Indeed it is easy to do now with the way KeyConverter and Separator are
written, and yeah I was ambiguous about whether we should encode.
After thinking about it a bit more I do think we should encode tabs as well. If
we encode both we should ensure that we encode and decode in the same order.
Probably as a general rule we should encode/decode things that are specified by
a user, especially those things that we can expect to see spaces (or tabs) in,
but probably as a good practice any values that comes from a user that goes
into a column qualifier or rowkey.
was (Author: jrottinghuis):
Indeed it is easy to do now with the way KeyConverter and Separator are
written, and yeah I was ambiguous about whether we should encode.
After thinking about it a bit more I do think we should encode tabs as well. If
we encode both we should ensure that we encode and decode in the same order.
Probably as a general rule we should encode/decode things that are specified by
a user, especially those things that we can expect to see spaces (or tabs) in,
but probably as a good practice any values that comes from a user that goes
into a column qualifier.
> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
> Key: YARN-5109
> URL: https://issues.apache.org/jira/browse/YARN-5109
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: YARN-2928
> Reporter: Sangjin Lee
> Assignee: Varun Saxena
> Priority: Blocker
> Labels: yarn-2928-1st-milestone
> Attachments: YARN-5109-YARN-2928.003.patch,
> YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch,
> YARN-5109-YARN-2928.03.patch, YARN-5109-YARN-2928.04.patch,
> YARN-5109-YARN-2928.05.patch, YARN-5109-YARN-2928.06.patch
>
>
> When we store timestamps (for example as part of the row key or part of the
> column name for an event), the bytes are used as is without any encoding. If
> the byte value happens to contain a separator character we use (e.g. "!" or
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
> incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event)
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]