[ 
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302345#comment-15302345
 ] 

Joep Rottinghuis edited comment on YARN-5109 at 5/26/16 4:32 PM:
-----------------------------------------------------------------

Indeed it is easy to do now with the way KeyConverter and Separator are 
written, and yeah I was ambiguous about whether we should encode.
After thinking about it a bit more I do think we should encode tabs as well. If 
we encode both we should ensure that we encode and decode in the equivalent 
reverse order.
Probably as a general rule we should encode/decode things that are specified by 
a user, especially those things that we can expect to see spaces (or tabs) in, 
but probably as a good practice any values that comes from a user that goes 
into a column qualifier or rowkey.


was (Author: jrottinghuis):
Indeed it is easy to do now with the way KeyConverter and Separator are 
written, and yeah I was ambiguous about whether we should encode.
After thinking about it a bit more I do think we should encode tabs as well. If 
we encode both we should ensure that we encode and decode in the same order.
Probably as a general rule we should encode/decode things that are specified by 
a user, especially those things that we can expect to see spaces (or tabs) in, 
but probably as a good practice any values that comes from a user that goes 
into a column qualifier or rowkey.

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.003.patch, 
> YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch, 
> YARN-5109-YARN-2928.03.patch, YARN-5109-YARN-2928.04.patch, 
> YARN-5109-YARN-2928.05.patch, YARN-5109-YARN-2928.06.patch
>
>
> When we store timestamps (for example as part of the row key or part of the 
> column name for an event), the bytes are used as is without any encoding. If 
> the byte value happens to contain a separator character we use (e.g. "!" or 
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
>  incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) 
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event 
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to