[ 
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297844#comment-15297844
 ] 

Varun Saxena commented on YARN-5109:
------------------------------------

bq. Also, do we have a test that tests an encoded long having a separator in 
it? After all, that's what caused us to uncover this issue.
Yes, we have. In TestKeyConverters, I am trying to create flow run id and 
cluster timestamp(in app id) in a manner that will have separators in it. Event 
column name issue is also simulated. Infact it takes care of the case if 
QUALIFIER changes in future as well. 
TestHBaseTimelineStorage#testEventsEscapeTs takes care of issue with event 
column name in an E2E test case.

bq. Should we replace "" with Separator.EMPTY_BYTES? That should be equivalent, 
right?
As such, its not completely equal. We are calling joinEncoded, which takes 
strings. If we call join, we will have to first encode the string. I anyways 
added a constant EMPTY_STRING in Separator and using it.

bq. I think NO_LIMIT_SPLIT and VARIABLE_SIZE are getting confusing. Since we're 
using VARIABLE_SIZE for the most part, can we remove NO_LIMIT_SPLIT
NO_LIMIT_SPLIT is meant for indicating there is no limit to number of splits 
returned. VARIABLE_SIZE is used to indicate that size of a segment in split is 
variable. Anyways we can say VARIABLE_SIZE means not a fixed number of splits 
as well.

Other issues have been fixed.




> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.003.patch, 
> YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch, 
> YARN-5109-YARN-2928.03.patch
>
>
> When we store timestamps (for example as part of the row key or part of the 
> column name for an event), the bytes are used as is without any encoding. If 
> the byte value happens to contain a separator character we use (e.g. "!" or 
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
>  incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) 
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event 
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to