[jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors

Joep Rottinghuis (JIRA) Wed, 25 May 2016 16:24:39 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301051#comment-15301051
 ]


Joep Rottinghuis commented on YARN-5109:
----------------------------------------

[~varun_saxena] please go ahead with the patch. I was thinking about a couple 
of additional items, but that goes beyond this patch, so I'll file a separate 
jira and patch for those, so please go ahead with what you have and what we 
discussed.

Wrt. encoding a long, you're right, we should probably either have a 
LongKeyConverter and use that to cleanly go back and forth. Note that we do 
already have a LongConverter implementing a slightly different interface 
(NumericValueConverter). We could either add an additional interface to this, 
or create a new class and have the implementation delegated to the existing 
class.

bq. "By the way, we are encoding spaces in column qualifiers. Any reason why we 
would not want spaces in column qualifiers ? We are not using space as a 
separator."

Yeah, while in HBase you can technically use non-printable characters or pretty 
much any series of bytes as column qualifiers, when working with the data, and 
especially administering any of this through the HBase shell, using spaces (or 
even tabs) make our life really difficult. We don't really expect tabs in names 
that are sent, but spaces are common in application names.
Spaces are similarly inconvenient to deal with in rest style calls as well, but 
that is a slightly different matter. Tabs are further often used in mapreduce 
to separate keys and values, so that adds further headache. It would be 
relatively easy to encode (and decode) tabs in strings, which should just 
happen in one or two methods now right?

> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.003.patch, 
> YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch, 
> YARN-5109-YARN-2928.03.patch, YARN-5109-YARN-2928.04.patch, 
> YARN-5109-YARN-2928.05.patch, YARN-5109-YARN-2928.06.patch
>
>
> When we store timestamps (for example as part of the row key or part of the 
> column name for an event), the bytes are used as is without any encoding. If 
> the byte value happens to contain a separator character we use (e.g. "!" or 
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
>  incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) 
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event 
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-5109) timestamps are stored unencoded causing parse errors

Reply via email to