[ 
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297844#comment-15297844
 ] 

Varun Saxena edited comment on YARN-5109 at 5/24/16 9:14 AM:
-------------------------------------------------------------

[~sjlee0],
bq. Also, do we have a test that tests an encoded long having a separator in 
it? After all, that's what caused us to uncover this issue.
Yes, we have. In TestKeyConverters, I am trying to create flow run id and 
cluster timestamp(in app id) in a manner that will have separators in it. Event 
column name issue is also simulated. Infact it takes care of the case if 
QUALIFIER changes in future as well. 
TestHBaseTimelineStorage#testEventsEscapeTs takes care of issue with event 
column name in an E2E test case.

bq. Should we replace "" with Separator.EMPTY_BYTES? That should be equivalent, 
right?
As such, its not completely equal. We are calling joinEncoded, which takes 
strings. If we call join, we will have to first encode the string. I anyways 
added a constant EMPTY_STRING in Separator and using it.

bq. I think NO_LIMIT_SPLIT and VARIABLE_SIZE are getting confusing. Since we're 
using VARIABLE_SIZE for the most part, can we remove NO_LIMIT_SPLIT
NO_LIMIT_SPLIT is meant for indicating there is no limit to number of splits 
returned. VARIABLE_SIZE is used to indicate that size of a segment in split is 
variable. Anyways we can say VARIABLE_SIZE means not a fixed number of splits 
as well.

These issues are already handled or will be handled in new patch as per 
explanation given
Other issues have been fixed as per your comments.


was (Author: varun_saxena):
bq. Also, do we have a test that tests an encoded long having a separator in 
it? After all, that's what caused us to uncover this issue.
Yes, we have. In TestKeyConverters, I am trying to create flow run id and 
cluster timestamp(in app id) in a manner that will have separators in it. Event 
column name issue is also simulated. Infact it takes care of the case if 
QUALIFIER changes in future as well. 
TestHBaseTimelineStorage#testEventsEscapeTs takes care of issue with event 
column name in an E2E test case.

bq. Should we replace "" with Separator.EMPTY_BYTES? That should be equivalent, 
right?
As such, its not completely equal. We are calling joinEncoded, which takes 
strings. If we call join, we will have to first encode the string. I anyways 
added a constant EMPTY_STRING in Separator and using it.

bq. I think NO_LIMIT_SPLIT and VARIABLE_SIZE are getting confusing. Since we're 
using VARIABLE_SIZE for the most part, can we remove NO_LIMIT_SPLIT
NO_LIMIT_SPLIT is meant for indicating there is no limit to number of splits 
returned. VARIABLE_SIZE is used to indicate that size of a segment in split is 
variable. Anyways we can say VARIABLE_SIZE means not a fixed number of splits 
as well.

Other issues have been fixed.




> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
>                 Key: YARN-5109
>                 URL: https://issues.apache.org/jira/browse/YARN-5109
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Varun Saxena
>            Priority: Blocker
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-5109-YARN-2928.003.patch, 
> YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch, 
> YARN-5109-YARN-2928.03.patch
>
>
> When we store timestamps (for example as part of the row key or part of the 
> column name for an event), the bytes are used as is without any encoding. If 
> the byte value happens to contain a separator character we use (e.g. "!" or 
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
>  incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event) 
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event 
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to