[
https://issues.apache.org/jira/browse/YARN-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297844#comment-15297844
]
Varun Saxena edited comment on YARN-5109 at 5/24/16 9:14 AM:
-------------------------------------------------------------
[~sjlee0],
bq. Also, do we have a test that tests an encoded long having a separator in
it? After all, that's what caused us to uncover this issue.
Yes, we have. In TestKeyConverters, I am trying to create flow run id and
cluster timestamp(in app id) in a manner that will have separators in it. Event
column name issue is also simulated. Infact it takes care of the case if
QUALIFIER changes in future as well.
TestHBaseTimelineStorage#testEventsEscapeTs takes care of issue with event
column name in an E2E test case.
bq. Should we replace "" with Separator.EMPTY_BYTES? That should be equivalent,
right?
As such, its not completely equal. We are calling joinEncoded, which takes
strings. If we call join, we will have to first encode the string. I anyways
added a constant EMPTY_STRING in Separator and using it.
bq. I think NO_LIMIT_SPLIT and VARIABLE_SIZE are getting confusing. Since we're
using VARIABLE_SIZE for the most part, can we remove NO_LIMIT_SPLIT
NO_LIMIT_SPLIT is meant for indicating there is no limit to number of splits
returned. VARIABLE_SIZE is used to indicate that size of a segment in split is
variable. Anyways we can say VARIABLE_SIZE means not a fixed number of splits
as well.
These issues are already handled or will be handled in new patch as per
explanation given
Other issues have been fixed as per your comments.
was (Author: varun_saxena):
bq. Also, do we have a test that tests an encoded long having a separator in
it? After all, that's what caused us to uncover this issue.
Yes, we have. In TestKeyConverters, I am trying to create flow run id and
cluster timestamp(in app id) in a manner that will have separators in it. Event
column name issue is also simulated. Infact it takes care of the case if
QUALIFIER changes in future as well.
TestHBaseTimelineStorage#testEventsEscapeTs takes care of issue with event
column name in an E2E test case.
bq. Should we replace "" with Separator.EMPTY_BYTES? That should be equivalent,
right?
As such, its not completely equal. We are calling joinEncoded, which takes
strings. If we call join, we will have to first encode the string. I anyways
added a constant EMPTY_STRING in Separator and using it.
bq. I think NO_LIMIT_SPLIT and VARIABLE_SIZE are getting confusing. Since we're
using VARIABLE_SIZE for the most part, can we remove NO_LIMIT_SPLIT
NO_LIMIT_SPLIT is meant for indicating there is no limit to number of splits
returned. VARIABLE_SIZE is used to indicate that size of a segment in split is
variable. Anyways we can say VARIABLE_SIZE means not a fixed number of splits
as well.
Other issues have been fixed.
> timestamps are stored unencoded causing parse errors
> ----------------------------------------------------
>
> Key: YARN-5109
> URL: https://issues.apache.org/jira/browse/YARN-5109
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: YARN-2928
> Reporter: Sangjin Lee
> Assignee: Varun Saxena
> Priority: Blocker
> Labels: yarn-2928-1st-milestone
> Attachments: YARN-5109-YARN-2928.003.patch,
> YARN-5109-YARN-2928.01.patch, YARN-5109-YARN-2928.02.patch,
> YARN-5109-YARN-2928.03.patch
>
>
> When we store timestamps (for example as part of the row key or part of the
> column name for an event), the bytes are used as is without any encoding. If
> the byte value happens to contain a separator character we use (e.g. "!" or
> "="), it causes a parse failure when we read it.
> I came across this while looking into this error in the timeline reader:
> {noformat}
> 2016-05-17 21:28:38,643 WARN
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
> incorrectly formatted column name: it will be discarded
> {noformat}
> I traced the data that was causing this, and the column name (for the event)
> was the following:
> {noformat}
> i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
> {noformat}
> Note that the column name is supposed to be of the format (event
> id)=(timestamp)=(event info key). However, observe the timestamp portion:
> {noformat}
> \x7F\xFF\xFE\xABDY=\x99
> {noformat}
> The presence of the separator ("=") causes the parse error.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]