[
https://issues.apache.org/jira/browse/YARN-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313290#comment-15313290
]
Sangjin Lee commented on YARN-5167:
-----------------------------------
OK, I've hit a snag with this idea.
Initially we thought that we could always handle values safely if we
# escape a naturally occurring encoded value sequence (by adding a preceding
backslash for example)
# and encode naked values
and
# decode encoded values *only if* the encoded value sequence is NOT escaped
(i.e. not preceded by a backslash)
# and finally de-escape the backslash (remove the backslash if it is followed
by the encoded value sequence) to get back the original naturally occurring
encoded value sequence
I implemented this fairly easily, but I realized that we still have a pretty
challenging ambiguity. The problem is if *we have the raw value preceded by a
backslash*. For example, suppose the following is the original string:
{noformat}
\=%1$
{noformat}
Note that {{=}} is a value we want to encode, and {{%1$}} is the encoded
equivalent. In this case, the user input contains both the raw value and a
naturally occurring encoded value. If we put this through the above scheme,
first we escape the naturally occurring encoded value:
{noformat}
\=\%1$
{noformat}
The next step is to encode the raw value ({{=}}). Then it becomes
{noformat}
\%1$\%1$
{noformat}
Note that now we have two identical parts. It is not possible to determine
whether it was an encoded value that happened to be preceded by the escape
character, or a naturally occurring encoded value that was escaped.
It's not clear how we can handle this issue without adding a whole lot more
complexity. We can get increasingly sophisticated in trying to figure out these
next combinations, but I am afraid we would hit the point of diminishing
returns.
I am now thinking of a different idea. This is basically a similar idea to how
URL encoding works. We could consider {{%}} an implicit reserved character as
it starts all the encoded values. The idea is
# encode {{%}} before encoding a series of separator values
# proceed to encode other values
# on decoding, decode all values except {{%}}
# finally decode {{%}}
Suppose the original string is
{noformat}
%=%1$
{noformat}
If we follow the new idea, we will encode this to {{%9$=%9$1$}} to finally
{{%9$%1$%9$1$}}. Conversely, we would decode it to {{%9$=%9$1$}} to finally
{{%=%1$}}.
I believe this scheme would work in all cases, but I'd like you to poke holes
in this idea to see if it stands up.
> Escaping occurences of encodedValues
> ------------------------------------
>
> Key: YARN-5167
> URL: https://issues.apache.org/jira/browse/YARN-5167
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Joep Rottinghuis
> Assignee: Sangjin Lee
> Priority: Critical
> Labels: yarn-2928-1st-milestone
>
> We had earlier decided to punt on this, but in discussing YARN-5109 we
> thought it would be best to just be safe rather than sorry later on.
> Encoded sequences can occur in the original string, especially in case of
> "foreign key" if we decide to have lookups.
> For example, space is encoded as %2$.
> Encoding "String with %2$ in it" would decode to "String with in it".
> We though we should first escape existing occurrences of encoded strings by
> prefixing a backslash (even if there is already a backslash that should be
> ok). Then we should replace all unencoded strings.
> On the way out, we should replace all occurrences of our encoded string to
> the original except when it is prefixed by an escape character. Lastly we
> should strip off the one additional backslash in front of each remaining
> (escaped) sequence.
> If we add the following entry to TestSeparator#testEncodeDecode() that
> demonstrates what this jira should accomplish:
> {code}
> testEncodeDecode("Double-escape %2$ and %3$ or \\%2$ or \\%3$, nor
> \\\\%2$ = no problem!", Separator.QUALIFIERS,
> Separator.VALUES, Separator.SPACE, Separator.TAB);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]