[ 
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313290#comment-15313290
 ] 

Sangjin Lee commented on YARN-5167:
-----------------------------------

OK, I've hit a snag with this idea.

Initially we thought that we could always handle values safely if we
# escape a naturally occurring encoded value sequence (by adding a preceding 
backslash for example)
# and encode naked values
and
# decode encoded values *only if* the encoded value sequence is NOT escaped 
(i.e. not preceded by a backslash)
# and finally de-escape the backslash (remove the backslash if it is followed 
by the encoded value sequence) to get back the original naturally occurring 
encoded value sequence

I implemented this fairly easily, but I realized that we still have a pretty 
challenging ambiguity. The problem is if *we have the raw value preceded by a 
backslash*. For example, suppose the following is the original string:
{noformat}
\=%1$
{noformat}

Note that {{=}} is a value we want to encode, and {{%1$}} is the encoded 
equivalent. In this case, the user input contains both the raw value and a 
naturally occurring encoded value. If we put this through the above scheme, 
first we escape the naturally occurring encoded value:
{noformat}
\=\%1$
{noformat}

The next step is to encode the raw value ({{=}}). Then it becomes
{noformat}
\%1$\%1$
{noformat}

Note that now we have two identical parts. It is not possible to determine 
whether it was an encoded value that happened to be preceded by the escape 
character, or a naturally occurring encoded value that was escaped.

It's not clear how we can handle this issue without adding a whole lot more 
complexity. We can get increasingly sophisticated in trying to figure out these 
next combinations, but I am afraid we would hit the point of diminishing 
returns.

I am now thinking of a different idea. This is basically a similar idea to how 
URL encoding works. We could consider {{%}} an implicit reserved character as 
it starts all the encoded values. The idea is
# encode {{%}} before encoding a series of separator values
# proceed to encode other values
# on decoding, decode all values except {{%}}
# finally decode {{%}}

Suppose the original string is
{noformat}
%=%1$
{noformat}

If we follow the new idea, we will encode this to {{%9$=%9$1$}} to finally 
{{%9$%1$%9$1$}}. Conversely, we would decode it to {{%9$=%9$1$}} to finally 
{{%=%1$}}.

I believe this scheme would work in all cases, but I'd like you to poke holes 
in this idea to see if it stands up.

> Escaping occurences of encodedValues
> ------------------------------------
>
>                 Key: YARN-5167
>                 URL: https://issues.apache.org/jira/browse/YARN-5167
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Sangjin Lee
>            Priority: Critical
>              Labels: yarn-2928-1st-milestone
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we 
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of 
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with   in it".
> We though we should first escape existing occurrences of encoded strings by 
> prefixing a backslash (even if there is already a backslash that should be 
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to 
> the original except when it is prefixed by an escape character. Lastly we 
> should strip off the one additional backslash in front of each remaining 
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that 
> demonstrates what this jira should accomplish:
> {code}
>     testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor  
> \\\\%2$ = no problem!", Separator.QUALIFIERS,
>         Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to