[ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572000#comment-14572000
 ] 

Joep Rottinghuis commented on YARN-3706:
----------------------------------------

It turns out that the TimelineWriterUtils.join has a bug where it returns an 
extra byte at the end of the return value if a null argument is passed. In 
attempting to fix this I realized we're having a hard time to distinguish nulls 
from spaces.

As I was discussing the fix with [~sjlee0] I realized that we currently have a 
mix of replace, cleanse etc. Sometimes we replace, sometimes we strip. That is 
a bit of a mess. He wondered if we can simply URL Encode all columns.
Rather than doing that I'm not taking the approach to URL encode the separators 
that are needed, and to change to ensure that we set a limit when splitting 
separators out again.

The only downside is that we still cannot differentiate between null values and 
empty strings, but in most cases when we need to encode qualifiers in columns, 
this will not happen (entity IDs are never null). The other disadvantage is 
that if an identifier (rowkey, related entity key, etc.) contain URL encoded 
strings, we might end up decoding them. I think that is an acceptable approach.

New patch with these fixes coming up.

> Generalize native HBase writer for additional tables
> ----------------------------------------------------
>
>                 Key: YARN-3706
>                 URL: https://issues.apache.org/jira/browse/YARN-3706
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Joep Rottinghuis
>            Priority: Minor
>         Attachments: YARN-3706-YARN-2928.001.patch, 
> YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
> YARN-3726-YARN-2928.004.patch
>
>
> When reviewing YARN-3411 we noticed that we could change the class hierarchy 
> a little in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in 
> place, as performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to