Re: [text] On the value of idempotent string escape methods?

2017-02-22 Thread sebb
On 22 February 2017 at 12:41, Raymond DeCampo  wrote:
> On Mon, Feb 20, 2017 at 11:40 PM, Sampanna Kahu  wrote:
>
>> Hi Guys,
>> Very good points are being made above. Please allow me to add my two cents
>> :-)
>>
>> What if the string contains syntactically valid HTML characters/tags and
>> our aim is to prevent rendering these tags in the browser when this string
>> is being served via a web application? Or prevent the execution of harmful
>> embedded scripts when serving it? The 'escapeOnce' method could be useful
>> here, right?
>>
>> To explain better, let's consider an example of the specific use-case that
>> I had in mind when building the 'escapeOnce' method:
>> Consider the scenario of a simple restful web application where users can
>> manipulate their text using simple crud operations. Lets assume that we do
>> not have the 'escapeOnce' method yet.
>> 1. A user comes and submits his string. We escape it and store it in our
>> database. If the string had any HTML characters, they would have gotten
>> escaped.
>>
>
> As others have pointed out, escaping the data and then storing it is a
> mistake.
>
>
>> 2. After some time, the same user fetches his string, adds some more HTML
>> characters and submits it. At this point, although the escape method would
>> correctly escape the freshly added HTML characters, it would escape the
>> older escaped HTML characters again! (for example  would become
>> gt;)
>>
>
> Wouldn't you have unescaped the string before sending it to the user for
> editing?

+1

>
>>
>> And this effect gets magnified if step number 2 above is repeated.
>>
>> How do we solve the above problem without the 'escapeOnce' method?
>>
>
> I imagine that at this point you are stuck with the escaped data in the
> database.  If that is the case, the best solution is to unescape the string
> before allowing modifications.

+1

> If for some reason that is not an option, you can apply unescape to the
> modified string and then apply escape.

Unfortunately that does not work in all cases, see below for a simple example.

> This will prevent re-escaping of
> existing escape sequences and will escape anything new that needs it.

The unescape would have to ignore an ampersand that is not part of an
escape sequence.
Since that is invalid, I would expect unescape to throw an exception
in such cases.
But assuming unescape does ignore such invalid strings, there is a
further problem:

It will only work if the user does not add text containing a valid
escape sequence which needs to be double-escaped.

For example if they wish to add:

"To code less than (<) in HTML, use ''"

Assuming it passes the unescape phase, it will become:

"To code less than (<) in HTML, use '<'"

After the re-escape it will become:

"To code less than () in HTML, use ''"

Not what was intended by the user.

> Since this is just two consecutive calls to existing library functionality
> it hardly seems burdensome on the users to omit a convenience function for
> it.

However that process won't always do the right thing, as shown above.

i.e. no such convenience function is possible.

> (Especially since it encourages what are, IMHO, bad practices.)
>

Very true.

>
>>
>> On 20 February 2017 at 21:40, sebb  wrote:
>>
>> > On 20 February 2017 at 15:36, Rob Tompkins  wrote:
>> > >
>> > >> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
>> > >>
>> > >> On 20 February 2017 at 14:55, Rob Tompkins 
>> wrote:
>> > >>>
>> >  On Feb 20, 2017, at 4:31 AM, sebb  wrote:
>> > 
>> >  On 19 February 2017 at 14:29, Raymond DeCampo > > > wrote:
>> > > I am trying to see how having the proposed unescape() method leads
>> > to an a
>> > > useful escape method.
>> > >
>> > > E.g. clearly unescape("") would evaluate to "&".  So would
>> > > unescape("amp;").  That means the proposed escape() method
>> > would also
>> > > have the same output for "" and "amp;".
>> > >
>> > > I think a better approach for an idempotent escape would be to just
>> > > unescape the string once, and then run the traditional escape.
>> > 
>> >  That does not eliminate the problems, as you state below.
>> > 
>> > > You will
>> > > still have issues if the user intended to escape the string ""
>> > but you
>> > > are never going to crack that without some kind of state saving.
>> > 
>> >  That is my exact point.
>> > 
>> >  Since it's not possible for the function to work reliably, we should
>> >  not mislead users by pretending that there is a magic method that
>> >  works.
>> > 
>> > > Than given that the functionality is available via to consecutive
>> > calls to
>> > > existing methods, I would probably be disinclined to include it in
>> > the
>> > > library.
>> > 
>> >  +1
>> > >>>
>> 

Re: [text] On the value of idempotent string escape methods?

2017-02-22 Thread Raymond DeCampo
On Mon, Feb 20, 2017 at 11:40 PM, Sampanna Kahu  wrote:

> Hi Guys,
> Very good points are being made above. Please allow me to add my two cents
> :-)
>
> What if the string contains syntactically valid HTML characters/tags and
> our aim is to prevent rendering these tags in the browser when this string
> is being served via a web application? Or prevent the execution of harmful
> embedded scripts when serving it? The 'escapeOnce' method could be useful
> here, right?
>
> To explain better, let's consider an example of the specific use-case that
> I had in mind when building the 'escapeOnce' method:
> Consider the scenario of a simple restful web application where users can
> manipulate their text using simple crud operations. Lets assume that we do
> not have the 'escapeOnce' method yet.
> 1. A user comes and submits his string. We escape it and store it in our
> database. If the string had any HTML characters, they would have gotten
> escaped.
>

As others have pointed out, escaping the data and then storing it is a
mistake.


> 2. After some time, the same user fetches his string, adds some more HTML
> characters and submits it. At this point, although the escape method would
> correctly escape the freshly added HTML characters, it would escape the
> older escaped HTML characters again! (for example  would become
> gt;)
>

Wouldn't you have unescaped the string before sending it to the user for
editing?


>
> And this effect gets magnified if step number 2 above is repeated.
>
> How do we solve the above problem without the 'escapeOnce' method?
>

I imagine that at this point you are stuck with the escaped data in the
database.  If that is the case, the best solution is to unescape the string
before allowing modifications.

If for some reason that is not an option, you can apply unescape to the
modified string and then apply escape.  This will prevent re-escaping of
existing escape sequences and will escape anything new that needs it.
Since this is just two consecutive calls to existing library functionality
it hardly seems burdensome on the users to omit a convenience function for
it.  (Especially since it encourages what are, IMHO, bad practices.)



>
> On 20 February 2017 at 21:40, sebb  wrote:
>
> > On 20 February 2017 at 15:36, Rob Tompkins  wrote:
> > >
> > >> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
> > >>
> > >> On 20 February 2017 at 14:55, Rob Tompkins 
> wrote:
> > >>>
> >  On Feb 20, 2017, at 4:31 AM, sebb  wrote:
> > 
> >  On 19 February 2017 at 14:29, Raymond DeCampo  > > wrote:
> > > I am trying to see how having the proposed unescape() method leads
> > to an a
> > > useful escape method.
> > >
> > > E.g. clearly unescape("") would evaluate to "&".  So would
> > > unescape("amp;").  That means the proposed escape() method
> > would also
> > > have the same output for "" and "amp;".
> > >
> > > I think a better approach for an idempotent escape would be to just
> > > unescape the string once, and then run the traditional escape.
> > 
> >  That does not eliminate the problems, as you state below.
> > 
> > > You will
> > > still have issues if the user intended to escape the string ""
> > but you
> > > are never going to crack that without some kind of state saving.
> > 
> >  That is my exact point.
> > 
> >  Since it's not possible for the function to work reliably, we should
> >  not mislead users by pretending that there is a magic method that
> >  works.
> > 
> > > Than given that the functionality is available via to consecutive
> > calls to
> > > existing methods, I would probably be disinclined to include it in
> > the
> > > library.
> > 
> >  +1
> > >>>
> > >>> I’m a (+1) for removal as well.
> > >>>
> > >>> Also, I didn’t mean for my example to sound like a proposal. I merely
> > was trying to get to a potentially valuable stateless idempotent string
> > escape function. Its contrivance it quite clear.
> > >>>
> > >>> Any other comments out there?
> > >>>
> > >>> We could provide a stateful escaper (that figures out how many
> escapes
> > a string is in), or a method that returns the number of escapes in a
> string
> > is. Again, I’m not all that sure on the value of such methods.
> > >>
> > >> I don't think it's possible to work out the number of times a string
> > >> has been escaped.
> > >
> > > That may indeed be true, but it is possible to return the number of
> > times unescape need be run before subsequent unescapes yield the same
> > result.
> >
> > That in itself is potentially ambiguous.
> > Does the unescaper keep going until there are no valid escape
> > sequences left, or does it stop when there is a least one ampersand
> > which is not part of a valid escape sequence?
> >
> > > Again, I’m 

Re: [text] On the value of idempotent string escape methods?

2017-02-21 Thread Chas Honton
Not sufficiently useful to include in commons. 

Chas

> On Feb 21, 2017, at 1:31 PM, Bhowmik, Bindul  wrote:
> 
>> On Tue, Feb 21, 2017 at 7:55 AM, sebb  wrote:
>>> On 21 February 2017 at 12:40, Rob Tompkins  wrote:
>>> 
 On Feb 21, 2017, at 6:02 AM, sebb  wrote:
 
 On 21 February 2017 at 04:40, Sampanna Kahu > wrote:
> Hi Guys,
> Very good points are being made above. Please allow me to add my two cents
> :-)
> 
> What if the string contains syntactically valid HTML characters/tags and
> our aim is to prevent rendering these tags in the browser when this string
> is being served via a web application? Or prevent the execution of harmful
> embedded scripts when serving it? The 'escapeOnce' method could be useful
> here, right?
 
 I don't think so.
 
> To explain better, let's consider an example of the specific use-case that
> I had in mind when building the 'escapeOnce' method:
> Consider the scenario of a simple restful web application where users can
> manipulate their text using simple crud operations. Lets assume that we do
> not have the 'escapeOnce' method yet.
> 1. A user comes and submits his string. We escape it and store it in our
> database. If the string had any HTML characters, they would have gotten
> escaped.
> 
> 2. After some time, the same user fetches his string, adds some more HTML
> characters and submits it. At this point, although the escape method would
> correctly escape the freshly added HTML characters, it would escape the
> older escaped HTML characters again! (for example  would become
> gt;)
> And this effect gets magnified if step number 2 above is repeated.
 
 Of course, that is my point.
 
 Also remember that you want to show the original string to the user.
 That's not possible in general if you use this approach.
 
 Suppose they originally entered
 
 "To code ampersand (&) in HTML, use ''"
 
 Using escapeOnce, this would become:
 
 "To code ampersand () in HTML, use ''"
 
 You can either show that directly to the user, or use an unescapeOnce
 and show them:
 
 "To code ampersand (&) in HTML, use '&'"
> 
> I have had this use case in a project (enclosing XML/HTML content in a
> XML stream) and the expected output for escapeOnce in this case would
> be:
> "To code ampersand () in HTML, use 'amp;'"
> 
> And similarly unsecape once would generate back:
> "To code ampersand (&) in HTML, use ''"
> 
> Just my two cents, as I have had to write this code.
> 
 
 Neither makes any sense.
 
> How do we solve the above problem without the 'escapeOnce' method?
 
 Store the raw string in the database and escape it just before display.
 
 If you are using Javascript, then use an approach such as this to escape 
 it:
 
 document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));
 
 See:
 
 http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ 
 
 
 This has a good discussion of some of the problems.
 
 ==
 
 Sorry, but it's not possible in general to do what you want, because
 one cannot reliably determine if a string has been escaped just from
 looking at the string.
>>> 
>>> Another thought occurred to me (again despite potential lack of value).
>>> 
>>> We should be able to quickly verify if there are any escape strings in the 
>>> string in question. A single application of unescape followed by checking 
>>> string equality with the original input would yield a predicate on the 
>>> existence of escape’s present in the input in question.
>> 
>> Again, what does unescape mean in this context?
>> Does it ignore incomplete escape sequences, or throw an error?
>> 
>>> From there we could: (1) escape if no escapes were present in the original, 
>>> or (2) throw an exception if there were escapes present in the original 
>>> string.
>>> Again, this feels contrived, so I’m not really suggesting that we add it. 
>>> I’m just playing with ideas here that could accomplish what Sampanna is 
>>> going for.
>> 
>> The request is impossible to fulfill reliably, and does not deserve to
>> be added to a Commons library.
>> 
>> I don't know why this is still being discussed.
>> 
>>> -Rob
>>> 
 
 The most one can do is to sanitise the string by escaping anything
 that is unescaped.
 However that process corrupts the input - a browser won't display the
 proper output.
 
>> On 20 February 2017 at 21:40, sebb  wrote:
>> 
>>> On 20 February 2017 at 15:36, Rob Tompkins  wrote:
>>> 
 

Re: [text] On the value of idempotent string escape methods?

2017-02-21 Thread Bhowmik, Bindul
On Tue, Feb 21, 2017 at 7:55 AM, sebb  wrote:
> On 21 February 2017 at 12:40, Rob Tompkins  wrote:
>>
>>> On Feb 21, 2017, at 6:02 AM, sebb  wrote:
>>>
>>> On 21 February 2017 at 04:40, Sampanna Kahu >> > wrote:
 Hi Guys,
 Very good points are being made above. Please allow me to add my two cents
 :-)

 What if the string contains syntactically valid HTML characters/tags and
 our aim is to prevent rendering these tags in the browser when this string
 is being served via a web application? Or prevent the execution of harmful
 embedded scripts when serving it? The 'escapeOnce' method could be useful
 here, right?
>>>
>>> I don't think so.
>>>
 To explain better, let's consider an example of the specific use-case that
 I had in mind when building the 'escapeOnce' method:
 Consider the scenario of a simple restful web application where users can
 manipulate their text using simple crud operations. Lets assume that we do
 not have the 'escapeOnce' method yet.
 1. A user comes and submits his string. We escape it and store it in our
 database. If the string had any HTML characters, they would have gotten
 escaped.

 2. After some time, the same user fetches his string, adds some more HTML
 characters and submits it. At this point, although the escape method would
 correctly escape the freshly added HTML characters, it would escape the
 older escaped HTML characters again! (for example  would become
 gt;)
 And this effect gets magnified if step number 2 above is repeated.
>>>
>>> Of course, that is my point.
>>>
>>> Also remember that you want to show the original string to the user.
>>> That's not possible in general if you use this approach.
>>>
>>> Suppose they originally entered
>>>
>>> "To code ampersand (&) in HTML, use ''"
>>>
>>> Using escapeOnce, this would become:
>>>
>>> "To code ampersand () in HTML, use ''"
>>>
>>> You can either show that directly to the user, or use an unescapeOnce
>>> and show them:
>>>
>>> "To code ampersand (&) in HTML, use '&'"

I have had this use case in a project (enclosing XML/HTML content in a
XML stream) and the expected output for escapeOnce in this case would
be:
"To code ampersand () in HTML, use 'amp;'"

And similarly unsecape once would generate back:
"To code ampersand (&) in HTML, use ''"

Just my two cents, as I have had to write this code.

>>>
>>> Neither makes any sense.
>>>
 How do we solve the above problem without the 'escapeOnce' method?
>>>
>>> Store the raw string in the database and escape it just before display.
>>>
>>> If you are using Javascript, then use an approach such as this to escape it:
>>>
>>> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));
>>>
>>> See:
>>>
>>> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ 
>>> 
>>>
>>> This has a good discussion of some of the problems.
>>>
>>> ==
>>>
>>> Sorry, but it's not possible in general to do what you want, because
>>> one cannot reliably determine if a string has been escaped just from
>>> looking at the string.
>>
>> Another thought occurred to me (again despite potential lack of value).
>>
>> We should be able to quickly verify if there are any escape strings in the 
>> string in question. A single application of unescape followed by checking 
>> string equality with the original input would yield a predicate on the 
>> existence of escape’s present in the input in question.
>
> Again, what does unescape mean in this context?
> Does it ignore incomplete escape sequences, or throw an error?
>
>> From there we could: (1) escape if no escapes were present in the original, 
>> or (2) throw an exception if there were escapes present in the original 
>> string.
>> Again, this feels contrived, so I’m not really suggesting that we add it. 
>> I’m just playing with ideas here that could accomplish what Sampanna is 
>> going for.
>
> The request is impossible to fulfill reliably, and does not deserve to
> be added to a Commons library.
>
> I don't know why this is still being discussed.
>
>> -Rob
>>
>>>
>>> The most one can do is to sanitise the string by escaping anything
>>> that is unescaped.
>>> However that process corrupts the input - a browser won't display the
>>> proper output.
>>>
 On 20 February 2017 at 21:40, sebb  wrote:

> On 20 February 2017 at 15:36, Rob Tompkins  wrote:
>>
>>> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
>>>
>>> On 20 February 2017 at 14:55, Rob Tompkins  wrote:

> On Feb 20, 2017, at 4:31 AM, sebb  wrote:
>
> On 19 February 2017 at 14:29, Raymond DeCampo  

Re: [text] On the value of idempotent string escape methods?

2017-02-21 Thread sebb
On 21 February 2017 at 12:40, Rob Tompkins  wrote:
>
>> On Feb 21, 2017, at 6:02 AM, sebb  wrote:
>>
>> On 21 February 2017 at 04:40, Sampanna Kahu > > wrote:
>>> Hi Guys,
>>> Very good points are being made above. Please allow me to add my two cents
>>> :-)
>>>
>>> What if the string contains syntactically valid HTML characters/tags and
>>> our aim is to prevent rendering these tags in the browser when this string
>>> is being served via a web application? Or prevent the execution of harmful
>>> embedded scripts when serving it? The 'escapeOnce' method could be useful
>>> here, right?
>>
>> I don't think so.
>>
>>> To explain better, let's consider an example of the specific use-case that
>>> I had in mind when building the 'escapeOnce' method:
>>> Consider the scenario of a simple restful web application where users can
>>> manipulate their text using simple crud operations. Lets assume that we do
>>> not have the 'escapeOnce' method yet.
>>> 1. A user comes and submits his string. We escape it and store it in our
>>> database. If the string had any HTML characters, they would have gotten
>>> escaped.
>>>
>>> 2. After some time, the same user fetches his string, adds some more HTML
>>> characters and submits it. At this point, although the escape method would
>>> correctly escape the freshly added HTML characters, it would escape the
>>> older escaped HTML characters again! (for example  would become
>>> gt;)
>>> And this effect gets magnified if step number 2 above is repeated.
>>
>> Of course, that is my point.
>>
>> Also remember that you want to show the original string to the user.
>> That's not possible in general if you use this approach.
>>
>> Suppose they originally entered
>>
>> "To code ampersand (&) in HTML, use ''"
>>
>> Using escapeOnce, this would become:
>>
>> "To code ampersand () in HTML, use ''"
>>
>> You can either show that directly to the user, or use an unescapeOnce
>> and show them:
>>
>> "To code ampersand (&) in HTML, use '&'"
>>
>> Neither makes any sense.
>>
>>> How do we solve the above problem without the 'escapeOnce' method?
>>
>> Store the raw string in the database and escape it just before display.
>>
>> If you are using Javascript, then use an approach such as this to escape it:
>>
>> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));
>>
>> See:
>>
>> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ 
>> 
>>
>> This has a good discussion of some of the problems.
>>
>> ==
>>
>> Sorry, but it's not possible in general to do what you want, because
>> one cannot reliably determine if a string has been escaped just from
>> looking at the string.
>
> Another thought occurred to me (again despite potential lack of value).
>
> We should be able to quickly verify if there are any escape strings in the 
> string in question. A single application of unescape followed by checking 
> string equality with the original input would yield a predicate on the 
> existence of escape’s present in the input in question.

Again, what does unescape mean in this context?
Does it ignore incomplete escape sequences, or throw an error?

> From there we could: (1) escape if no escapes were present in the original, 
> or (2) throw an exception if there were escapes present in the original 
> string.
> Again, this feels contrived, so I’m not really suggesting that we add it. I’m 
> just playing with ideas here that could accomplish what Sampanna is going for.

The request is impossible to fulfill reliably, and does not deserve to
be added to a Commons library.

I don't know why this is still being discussed.

> -Rob
>
>>
>> The most one can do is to sanitise the string by escaping anything
>> that is unescaped.
>> However that process corrupts the input - a browser won't display the
>> proper output.
>>
>>> On 20 February 2017 at 21:40, sebb  wrote:
>>>
 On 20 February 2017 at 15:36, Rob Tompkins  wrote:
>
>> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
>>
>> On 20 February 2017 at 14:55, Rob Tompkins  wrote:
>>>
 On Feb 20, 2017, at 4:31 AM, sebb  wrote:

 On 19 February 2017 at 14:29, Raymond DeCampo > wrote:
> I am trying to see how having the proposed unescape() method leads
 to an a
> useful escape method.
>
> E.g. clearly unescape("") would evaluate to "&".  So would
> unescape("amp;").  That means the proposed escape() method
 would also
> have the same output for "" and "amp;".
>
> I think a better approach for an idempotent escape would be to just
> unescape the string once, and then run the traditional 

Re: [text] On the value of idempotent string escape methods?

2017-02-21 Thread Rob Tompkins

> On Feb 21, 2017, at 6:02 AM, sebb  wrote:
> 
> On 21 February 2017 at 04:40, Sampanna Kahu  > wrote:
>> Hi Guys,
>> Very good points are being made above. Please allow me to add my two cents
>> :-)
>> 
>> What if the string contains syntactically valid HTML characters/tags and
>> our aim is to prevent rendering these tags in the browser when this string
>> is being served via a web application? Or prevent the execution of harmful
>> embedded scripts when serving it? The 'escapeOnce' method could be useful
>> here, right?
> 
> I don't think so.
> 
>> To explain better, let's consider an example of the specific use-case that
>> I had in mind when building the 'escapeOnce' method:
>> Consider the scenario of a simple restful web application where users can
>> manipulate their text using simple crud operations. Lets assume that we do
>> not have the 'escapeOnce' method yet.
>> 1. A user comes and submits his string. We escape it and store it in our
>> database. If the string had any HTML characters, they would have gotten
>> escaped.
>> 
>> 2. After some time, the same user fetches his string, adds some more HTML
>> characters and submits it. At this point, although the escape method would
>> correctly escape the freshly added HTML characters, it would escape the
>> older escaped HTML characters again! (for example  would become
>> gt;)
>> And this effect gets magnified if step number 2 above is repeated.
> 
> Of course, that is my point.
> 
> Also remember that you want to show the original string to the user.
> That's not possible in general if you use this approach.
> 
> Suppose they originally entered
> 
> "To code ampersand (&) in HTML, use ''"
> 
> Using escapeOnce, this would become:
> 
> "To code ampersand () in HTML, use ''"
> 
> You can either show that directly to the user, or use an unescapeOnce
> and show them:
> 
> "To code ampersand (&) in HTML, use '&'"
> 
> Neither makes any sense.
> 
>> How do we solve the above problem without the 'escapeOnce' method?
> 
> Store the raw string in the database and escape it just before display.
> 
> If you are using Javascript, then use an approach such as this to escape it:
> 
> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));
> 
> See:
> 
> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ 
> 
> 
> This has a good discussion of some of the problems.
> 
> ==
> 
> Sorry, but it's not possible in general to do what you want, because
> one cannot reliably determine if a string has been escaped just from
> looking at the string.

Another thought occurred to me (again despite potential lack of value). 

We should be able to quickly verify if there are any escape strings in the 
string in question. A single application of unescape followed by checking 
string equality with the original input would yield a predicate on the 
existence of escape’s present in the input in question. From there we could: 
(1) escape if no escapes were present in the original, or (2) throw an 
exception if there were escapes present in the original string.

Again, this feels contrived, so I’m not really suggesting that we add it. I’m 
just playing with ideas here that could accomplish what Sampanna is going for.

-Rob

> 
> The most one can do is to sanitise the string by escaping anything
> that is unescaped.
> However that process corrupts the input - a browser won't display the
> proper output.
> 
>> On 20 February 2017 at 21:40, sebb  wrote:
>> 
>>> On 20 February 2017 at 15:36, Rob Tompkins  wrote:
 
> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
> 
> On 20 February 2017 at 14:55, Rob Tompkins  wrote:
>> 
>>> On Feb 20, 2017, at 4:31 AM, sebb  wrote:
>>> 
>>> On 19 February 2017 at 14:29, Raymond DeCampo >> > wrote:
 I am trying to see how having the proposed unescape() method leads
>>> to an a
 useful escape method.
 
 E.g. clearly unescape("") would evaluate to "&".  So would
 unescape("amp;").  That means the proposed escape() method
>>> would also
 have the same output for "" and "amp;".
 
 I think a better approach for an idempotent escape would be to just
 unescape the string once, and then run the traditional escape.
>>> 
>>> That does not eliminate the problems, as you state below.
>>> 
 You will
 still have issues if the user intended to escape the string ""
>>> but you
 are never going to crack that without some kind of state saving.
>>> 
>>> That is my exact point.
>>> 
>>> Since it's not possible for the function to work reliably, we should
>>> not mislead users by 

Re: [text] On the value of idempotent string escape methods?

2017-02-21 Thread sebb
On 21 February 2017 at 04:40, Sampanna Kahu  wrote:
> Hi Guys,
> Very good points are being made above. Please allow me to add my two cents
> :-)
>
> What if the string contains syntactically valid HTML characters/tags and
> our aim is to prevent rendering these tags in the browser when this string
> is being served via a web application? Or prevent the execution of harmful
> embedded scripts when serving it? The 'escapeOnce' method could be useful
> here, right?

I don't think so.

> To explain better, let's consider an example of the specific use-case that
> I had in mind when building the 'escapeOnce' method:
> Consider the scenario of a simple restful web application where users can
> manipulate their text using simple crud operations. Lets assume that we do
> not have the 'escapeOnce' method yet.
> 1. A user comes and submits his string. We escape it and store it in our
> database. If the string had any HTML characters, they would have gotten
> escaped.
>
> 2. After some time, the same user fetches his string, adds some more HTML
> characters and submits it. At this point, although the escape method would
> correctly escape the freshly added HTML characters, it would escape the
> older escaped HTML characters again! (for example  would become
> gt;)
> And this effect gets magnified if step number 2 above is repeated.

Of course, that is my point.

Also remember that you want to show the original string to the user.
That's not possible in general if you use this approach.

Suppose they originally entered

"To code ampersand (&) in HTML, use ''"

Using escapeOnce, this would become:

"To code ampersand () in HTML, use ''"

You can either show that directly to the user, or use an unescapeOnce
and show them:

"To code ampersand (&) in HTML, use '&'"

Neither makes any sense.

> How do we solve the above problem without the 'escapeOnce' method?

Store the raw string in the database and escape it just before display.

If you are using Javascript, then use an approach such as this to escape it:

document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));

See:

http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/

This has a good discussion of some of the problems.

==

Sorry, but it's not possible in general to do what you want, because
one cannot reliably determine if a string has been escaped just from
looking at the string.

The most one can do is to sanitise the string by escaping anything
that is unescaped.
However that process corrupts the input - a browser won't display the
proper output.

> On 20 February 2017 at 21:40, sebb  wrote:
>
>> On 20 February 2017 at 15:36, Rob Tompkins  wrote:
>> >
>> >> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
>> >>
>> >> On 20 February 2017 at 14:55, Rob Tompkins  wrote:
>> >>>
>>  On Feb 20, 2017, at 4:31 AM, sebb  wrote:
>> 
>>  On 19 February 2017 at 14:29, Raymond DeCampo > > wrote:
>> > I am trying to see how having the proposed unescape() method leads
>> to an a
>> > useful escape method.
>> >
>> > E.g. clearly unescape("") would evaluate to "&".  So would
>> > unescape("amp;").  That means the proposed escape() method
>> would also
>> > have the same output for "" and "amp;".
>> >
>> > I think a better approach for an idempotent escape would be to just
>> > unescape the string once, and then run the traditional escape.
>> 
>>  That does not eliminate the problems, as you state below.
>> 
>> > You will
>> > still have issues if the user intended to escape the string ""
>> but you
>> > are never going to crack that without some kind of state saving.
>> 
>>  That is my exact point.
>> 
>>  Since it's not possible for the function to work reliably, we should
>>  not mislead users by pretending that there is a magic method that
>>  works.
>> 
>> > Than given that the functionality is available via to consecutive
>> calls to
>> > existing methods, I would probably be disinclined to include it in
>> the
>> > library.
>> 
>>  +1
>> >>>
>> >>> I’m a (+1) for removal as well.
>> >>>
>> >>> Also, I didn’t mean for my example to sound like a proposal. I merely
>> was trying to get to a potentially valuable stateless idempotent string
>> escape function. Its contrivance it quite clear.
>> >>>
>> >>> Any other comments out there?
>> >>>
>> >>> We could provide a stateful escaper (that figures out how many escapes
>> a string is in), or a method that returns the number of escapes in a string
>> is. Again, I’m not all that sure on the value of such methods.
>> >>
>> >> I don't think it's possible to work out the number of times a string
>> >> has been escaped.
>> >
>> > That may indeed be true, but it is possible to return the number of
>> times 

Re: [text] On the value of idempotent string escape methods?

2017-02-20 Thread Sampanna Kahu
Hi Guys,
Very good points are being made above. Please allow me to add my two cents
:-)

What if the string contains syntactically valid HTML characters/tags and
our aim is to prevent rendering these tags in the browser when this string
is being served via a web application? Or prevent the execution of harmful
embedded scripts when serving it? The 'escapeOnce' method could be useful
here, right?

To explain better, let's consider an example of the specific use-case that
I had in mind when building the 'escapeOnce' method:
Consider the scenario of a simple restful web application where users can
manipulate their text using simple crud operations. Lets assume that we do
not have the 'escapeOnce' method yet.
1. A user comes and submits his string. We escape it and store it in our
database. If the string had any HTML characters, they would have gotten
escaped.
2. After some time, the same user fetches his string, adds some more HTML
characters and submits it. At this point, although the escape method would
correctly escape the freshly added HTML characters, it would escape the
older escaped HTML characters again! (for example  would become
gt;)

And this effect gets magnified if step number 2 above is repeated.

How do we solve the above problem without the 'escapeOnce' method?

On 20 February 2017 at 21:40, sebb  wrote:

> On 20 February 2017 at 15:36, Rob Tompkins  wrote:
> >
> >> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
> >>
> >> On 20 February 2017 at 14:55, Rob Tompkins  wrote:
> >>>
>  On Feb 20, 2017, at 4:31 AM, sebb  wrote:
> 
>  On 19 February 2017 at 14:29, Raymond DeCampo  > wrote:
> > I am trying to see how having the proposed unescape() method leads
> to an a
> > useful escape method.
> >
> > E.g. clearly unescape("") would evaluate to "&".  So would
> > unescape("amp;").  That means the proposed escape() method
> would also
> > have the same output for "" and "amp;".
> >
> > I think a better approach for an idempotent escape would be to just
> > unescape the string once, and then run the traditional escape.
> 
>  That does not eliminate the problems, as you state below.
> 
> > You will
> > still have issues if the user intended to escape the string ""
> but you
> > are never going to crack that without some kind of state saving.
> 
>  That is my exact point.
> 
>  Since it's not possible for the function to work reliably, we should
>  not mislead users by pretending that there is a magic method that
>  works.
> 
> > Than given that the functionality is available via to consecutive
> calls to
> > existing methods, I would probably be disinclined to include it in
> the
> > library.
> 
>  +1
> >>>
> >>> I’m a (+1) for removal as well.
> >>>
> >>> Also, I didn’t mean for my example to sound like a proposal. I merely
> was trying to get to a potentially valuable stateless idempotent string
> escape function. Its contrivance it quite clear.
> >>>
> >>> Any other comments out there?
> >>>
> >>> We could provide a stateful escaper (that figures out how many escapes
> a string is in), or a method that returns the number of escapes in a string
> is. Again, I’m not all that sure on the value of such methods.
> >>
> >> I don't think it's possible to work out the number of times a string
> >> has been escaped.
> >
> > That may indeed be true, but it is possible to return the number of
> times unescape need be run before subsequent unescapes yield the same
> result.
>
> That in itself is potentially ambiguous.
> Does the unescaper keep going until there are no valid escape
> sequences left, or does it stop when there is a least one ampersand
> which is not part of a valid escape sequence?
>
> > Again, I’m not sure if this is a valuable measure to concern ourselves
> with.
>
> I don't think it provides anything useful.
>
> >>
> >> The most one can do is to determine if a string has not been escaped.
> >> That would be the case where a string has one or more unescaped
> >> characters in it.
> >> For example "This & that" has obviously not been escaped.
> >>
> >> However if a string has no un-escaped characters it it, that does not
> >> necessarily mean that it has already been escaped.
> >> For example: "This  that".
> >> This might have been escaped - or it might not.
> >
> > Ah, I was using the definition of “having been escaped” to be that the
> string contains escape sequences.
> >
> >> For example it could be the answer to: "How does one code 'This &
> >> that' in HTML?”
> >>
> >> The application has to keep track of the escape-state of the string.
> >
> > Definitely agreed with your definition of “having been escaped."
> >
> >>
> >>> Cheers,
> >>> -Rob
> >>>
> 
> >
> > On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins 

Re: [text] On the value of idempotent string escape methods?

2017-02-20 Thread sebb
On 20 February 2017 at 15:36, Rob Tompkins  wrote:
>
>> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
>>
>> On 20 February 2017 at 14:55, Rob Tompkins  wrote:
>>>
 On Feb 20, 2017, at 4:31 AM, sebb  wrote:

 On 19 February 2017 at 14:29, Raymond DeCampo > wrote:
> I am trying to see how having the proposed unescape() method leads to an a
> useful escape method.
>
> E.g. clearly unescape("") would evaluate to "&".  So would
> unescape("amp;").  That means the proposed escape() method would also
> have the same output for "" and "amp;".
>
> I think a better approach for an idempotent escape would be to just
> unescape the string once, and then run the traditional escape.

 That does not eliminate the problems, as you state below.

> You will
> still have issues if the user intended to escape the string "" but 
> you
> are never going to crack that without some kind of state saving.

 That is my exact point.

 Since it's not possible for the function to work reliably, we should
 not mislead users by pretending that there is a magic method that
 works.

> Than given that the functionality is available via to consecutive calls to
> existing methods, I would probably be disinclined to include it in the
> library.

 +1
>>>
>>> I’m a (+1) for removal as well.
>>>
>>> Also, I didn’t mean for my example to sound like a proposal. I merely was 
>>> trying to get to a potentially valuable stateless idempotent string escape 
>>> function. Its contrivance it quite clear.
>>>
>>> Any other comments out there?
>>>
>>> We could provide a stateful escaper (that figures out how many escapes a 
>>> string is in), or a method that returns the number of escapes in a string 
>>> is. Again, I’m not all that sure on the value of such methods.
>>
>> I don't think it's possible to work out the number of times a string
>> has been escaped.
>
> That may indeed be true, but it is possible to return the number of times 
> unescape need be run before subsequent unescapes yield the same result.

That in itself is potentially ambiguous.
Does the unescaper keep going until there are no valid escape
sequences left, or does it stop when there is a least one ampersand
which is not part of a valid escape sequence?

> Again, I’m not sure if this is a valuable measure to concern ourselves with.

I don't think it provides anything useful.

>>
>> The most one can do is to determine if a string has not been escaped.
>> That would be the case where a string has one or more unescaped
>> characters in it.
>> For example "This & that" has obviously not been escaped.
>>
>> However if a string has no un-escaped characters it it, that does not
>> necessarily mean that it has already been escaped.
>> For example: "This  that".
>> This might have been escaped - or it might not.
>
> Ah, I was using the definition of “having been escaped” to be that the string 
> contains escape sequences.
>
>> For example it could be the answer to: "How does one code 'This &
>> that' in HTML?”
>>
>> The application has to keep track of the escape-state of the string.
>
> Definitely agreed with your definition of “having been escaped."
>
>>
>>> Cheers,
>>> -Rob
>>>

>
> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins  wrote:
>
>> In preparation for the 1.0 release, I think we should address Sebb's
>> concern in TEXT-40 about the attempt to create "idempotent" string escape
>> methods. By idempotent I mean someMethod("some string") =
>> someMethod(someMethod(someMethod(...someMethod("some string", a
>> single application of a method is equal to any number of the applications
>> of the method on the same input.
>>
>> Below I lay out a mechanism by which it is possible to write such 
>> methods,
>> but I don’t know the value in writing such methods. I'm merely expressing
>> that idempotency is a possibility.
>>
>> For string "un-escaping", I believe that we can write a method that,
>> indeed, is idempotent by simply running the un-escape method the finite
>> number of un-escapings to get to the point at which the string remains
>> unchanged between applications of the un-escaping method. (I believe 
>> that I
>> can write a proof that all un-escape methods have such a point, if that 
>> is
>> needed for the sake of discussion).
>>
>> If indeed we can create an idempotent un-escape method, then we can 
>> simply
>> take that method run it, and then run the escaping method one time. If we
>> always completely unescape and then escape once then we do have an
>> idempotent method.
>>
>> Such a method might not be all that valuable to the user though.
>> Furthermore, this just explains one 

Re: [text] On the value of idempotent string escape methods?

2017-02-20 Thread Rob Tompkins

> On Feb 20, 2017, at 10:30 AM, sebb  wrote:
> 
> On 20 February 2017 at 14:55, Rob Tompkins  wrote:
>> 
>>> On Feb 20, 2017, at 4:31 AM, sebb  wrote:
>>> 
>>> On 19 February 2017 at 14:29, Raymond DeCampo >> > wrote:
 I am trying to see how having the proposed unescape() method leads to an a
 useful escape method.
 
 E.g. clearly unescape("") would evaluate to "&".  So would
 unescape("amp;").  That means the proposed escape() method would also
 have the same output for "" and "amp;".
 
 I think a better approach for an idempotent escape would be to just
 unescape the string once, and then run the traditional escape.
>>> 
>>> That does not eliminate the problems, as you state below.
>>> 
 You will
 still have issues if the user intended to escape the string "" but you
 are never going to crack that without some kind of state saving.
>>> 
>>> That is my exact point.
>>> 
>>> Since it's not possible for the function to work reliably, we should
>>> not mislead users by pretending that there is a magic method that
>>> works.
>>> 
 Than given that the functionality is available via to consecutive calls to
 existing methods, I would probably be disinclined to include it in the
 library.
>>> 
>>> +1
>> 
>> I’m a (+1) for removal as well.
>> 
>> Also, I didn’t mean for my example to sound like a proposal. I merely was 
>> trying to get to a potentially valuable stateless idempotent string escape 
>> function. Its contrivance it quite clear.
>> 
>> Any other comments out there?
>> 
>> We could provide a stateful escaper (that figures out how many escapes a 
>> string is in), or a method that returns the number of escapes in a string 
>> is. Again, I’m not all that sure on the value of such methods.
> 
> I don't think it's possible to work out the number of times a string
> has been escaped.

That may indeed be true, but it is possible to return the number of times 
unescape need be run before subsequent unescapes yield the same result. Again, 
I’m not sure if this is a valuable measure to concern ourselves with.

> 
> The most one can do is to determine if a string has not been escaped.
> That would be the case where a string has one or more unescaped
> characters in it.
> For example "This & that" has obviously not been escaped.
> 
> However if a string has no un-escaped characters it it, that does not
> necessarily mean that it has already been escaped.
> For example: "This  that".
> This might have been escaped - or it might not.

Ah, I was using the definition of “having been escaped” to be that the string 
contains escape sequences.

> For example it could be the answer to: "How does one code 'This &
> that' in HTML?”
> 
> The application has to keep track of the escape-state of the string.

Definitely agreed with your definition of “having been escaped."

> 
>> Cheers,
>> -Rob
>> 
>>> 
 
 On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins  wrote:
 
> In preparation for the 1.0 release, I think we should address Sebb's
> concern in TEXT-40 about the attempt to create "idempotent" string escape
> methods. By idempotent I mean someMethod("some string") =
> someMethod(someMethod(someMethod(...someMethod("some string", a
> single application of a method is equal to any number of the applications
> of the method on the same input.
> 
> Below I lay out a mechanism by which it is possible to write such methods,
> but I don’t know the value in writing such methods. I'm merely expressing
> that idempotency is a possibility.
> 
> For string "un-escaping", I believe that we can write a method that,
> indeed, is idempotent by simply running the un-escape method the finite
> number of un-escapings to get to the point at which the string remains
> unchanged between applications of the un-escaping method. (I believe that 
> I
> can write a proof that all un-escape methods have such a point, if that is
> needed for the sake of discussion).
> 
> If indeed we can create an idempotent un-escape method, then we can simply
> take that method run it, and then run the escaping method one time. If we
> always completely unescape and then escape once then we do have an
> idempotent method.
> 
> Such a method might not be all that valuable to the user though.
> Furthermore, this just explains one way to create such an idempotent
> method. Whether or not more or more valuable methods exists, would take
> some more though.
> 
> Anyone have any thoughts? My feeling is that it might be more effort than
> it's worth to ensure that any string is only "singly encoded.” Further, we
> probably should give a look at the “escape_once” methods in
> StringEsapeUtils.
> 
> Cheers
> -Rob
> 

Re: [text] On the value of idempotent string escape methods?

2017-02-20 Thread sebb
On 20 February 2017 at 14:55, Rob Tompkins  wrote:
>
>> On Feb 20, 2017, at 4:31 AM, sebb  wrote:
>>
>> On 19 February 2017 at 14:29, Raymond DeCampo > > wrote:
>>> I am trying to see how having the proposed unescape() method leads to an a
>>> useful escape method.
>>>
>>> E.g. clearly unescape("") would evaluate to "&".  So would
>>> unescape("amp;").  That means the proposed escape() method would also
>>> have the same output for "" and "amp;".
>>>
>>> I think a better approach for an idempotent escape would be to just
>>> unescape the string once, and then run the traditional escape.
>>
>> That does not eliminate the problems, as you state below.
>>
>>> You will
>>> still have issues if the user intended to escape the string "" but you
>>> are never going to crack that without some kind of state saving.
>>
>> That is my exact point.
>>
>> Since it's not possible for the function to work reliably, we should
>> not mislead users by pretending that there is a magic method that
>> works.
>>
>>> Than given that the functionality is available via to consecutive calls to
>>> existing methods, I would probably be disinclined to include it in the
>>> library.
>>
>> +1
>
> I’m a (+1) for removal as well.
>
> Also, I didn’t mean for my example to sound like a proposal. I merely was 
> trying to get to a potentially valuable stateless idempotent string escape 
> function. Its contrivance it quite clear.
>
> Any other comments out there?
>
> We could provide a stateful escaper (that figures out how many escapes a 
> string is in), or a method that returns the number of escapes in a string is. 
> Again, I’m not all that sure on the value of such methods.

I don't think it's possible to work out the number of times a string
has been escaped.

The most one can do is to determine if a string has not been escaped.
That would be the case where a string has one or more unescaped
characters in it.
For example "This & that" has obviously not been escaped.

However if a string has no un-escaped characters it it, that does not
necessarily mean that it has already been escaped.
For example: "This  that".
This might have been escaped - or it might not.
For example it could be the answer to: "How does one code 'This &
that' in HTML?"

The application has to keep track of the escape-state of the string.

> Cheers,
> -Rob
>
>>
>>>
>>> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins  wrote:
>>>
 In preparation for the 1.0 release, I think we should address Sebb's
 concern in TEXT-40 about the attempt to create "idempotent" string escape
 methods. By idempotent I mean someMethod("some string") =
 someMethod(someMethod(someMethod(...someMethod("some string", a
 single application of a method is equal to any number of the applications
 of the method on the same input.

 Below I lay out a mechanism by which it is possible to write such methods,
 but I don’t know the value in writing such methods. I'm merely expressing
 that idempotency is a possibility.

 For string "un-escaping", I believe that we can write a method that,
 indeed, is idempotent by simply running the un-escape method the finite
 number of un-escapings to get to the point at which the string remains
 unchanged between applications of the un-escaping method. (I believe that I
 can write a proof that all un-escape methods have such a point, if that is
 needed for the sake of discussion).

 If indeed we can create an idempotent un-escape method, then we can simply
 take that method run it, and then run the escaping method one time. If we
 always completely unescape and then escape once then we do have an
 idempotent method.

 Such a method might not be all that valuable to the user though.
 Furthermore, this just explains one way to create such an idempotent
 method. Whether or not more or more valuable methods exists, would take
 some more though.

 Anyone have any thoughts? My feeling is that it might be more effort than
 it's worth to ensure that any string is only "singly encoded.” Further, we
 probably should give a look at the “escape_once” methods in
 StringEsapeUtils.

 Cheers
 -Rob
 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org


>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
>> 
>> For additional commands, e-mail: dev-h...@commons.apache.org 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional 

Re: [text] On the value of idempotent string escape methods?

2017-02-20 Thread Rob Tompkins

> On Feb 20, 2017, at 4:31 AM, sebb  wrote:
> 
> On 19 February 2017 at 14:29, Raymond DeCampo  > wrote:
>> I am trying to see how having the proposed unescape() method leads to an a
>> useful escape method.
>> 
>> E.g. clearly unescape("") would evaluate to "&".  So would
>> unescape("amp;").  That means the proposed escape() method would also
>> have the same output for "" and "amp;".
>> 
>> I think a better approach for an idempotent escape would be to just
>> unescape the string once, and then run the traditional escape.
> 
> That does not eliminate the problems, as you state below.
> 
>> You will
>> still have issues if the user intended to escape the string "" but you
>> are never going to crack that without some kind of state saving.
> 
> That is my exact point.
> 
> Since it's not possible for the function to work reliably, we should
> not mislead users by pretending that there is a magic method that
> works.
> 
>> Than given that the functionality is available via to consecutive calls to
>> existing methods, I would probably be disinclined to include it in the
>> library.
> 
> +1

I’m a (+1) for removal as well. 

Also, I didn’t mean for my example to sound like a proposal. I merely was 
trying to get to a potentially valuable stateless idempotent string escape 
function. Its contrivance it quite clear.

Any other comments out there? 

We could provide a stateful escaper (that figures out how many escapes a string 
is in), or a method that returns the number of escapes in a string is. Again, 
I’m not all that sure on the value of such methods.

Cheers,
-Rob

> 
>> 
>> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins  wrote:
>> 
>>> In preparation for the 1.0 release, I think we should address Sebb's
>>> concern in TEXT-40 about the attempt to create "idempotent" string escape
>>> methods. By idempotent I mean someMethod("some string") =
>>> someMethod(someMethod(someMethod(...someMethod("some string", a
>>> single application of a method is equal to any number of the applications
>>> of the method on the same input.
>>> 
>>> Below I lay out a mechanism by which it is possible to write such methods,
>>> but I don’t know the value in writing such methods. I'm merely expressing
>>> that idempotency is a possibility.
>>> 
>>> For string "un-escaping", I believe that we can write a method that,
>>> indeed, is idempotent by simply running the un-escape method the finite
>>> number of un-escapings to get to the point at which the string remains
>>> unchanged between applications of the un-escaping method. (I believe that I
>>> can write a proof that all un-escape methods have such a point, if that is
>>> needed for the sake of discussion).
>>> 
>>> If indeed we can create an idempotent un-escape method, then we can simply
>>> take that method run it, and then run the escaping method one time. If we
>>> always completely unescape and then escape once then we do have an
>>> idempotent method.
>>> 
>>> Such a method might not be all that valuable to the user though.
>>> Furthermore, this just explains one way to create such an idempotent
>>> method. Whether or not more or more valuable methods exists, would take
>>> some more though.
>>> 
>>> Anyone have any thoughts? My feeling is that it might be more effort than
>>> it's worth to ensure that any string is only "singly encoded.” Further, we
>>> probably should give a look at the “escape_once” methods in
>>> StringEsapeUtils.
>>> 
>>> Cheers
>>> -Rob
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>> 
>>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
> 
> For additional commands, e-mail: dev-h...@commons.apache.org 
> 


Re: [text] On the value of idempotent string escape methods?

2017-02-20 Thread sebb
On 19 February 2017 at 14:29, Raymond DeCampo  wrote:
> I am trying to see how having the proposed unescape() method leads to an a
> useful escape method.
>
> E.g. clearly unescape("") would evaluate to "&".  So would
> unescape("amp;").  That means the proposed escape() method would also
> have the same output for "" and "amp;".
>
> I think a better approach for an idempotent escape would be to just
> unescape the string once, and then run the traditional escape.

That does not eliminate the problems, as you state below.

> You will
> still have issues if the user intended to escape the string "" but you
> are never going to crack that without some kind of state saving.

That is my exact point.

Since it's not possible for the function to work reliably, we should
not mislead users by pretending that there is a magic method that
works.

> Than given that the functionality is available via to consecutive calls to
> existing methods, I would probably be disinclined to include it in the
> library.

+1

>
> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins  wrote:
>
>> In preparation for the 1.0 release, I think we should address Sebb's
>> concern in TEXT-40 about the attempt to create "idempotent" string escape
>> methods. By idempotent I mean someMethod("some string") =
>> someMethod(someMethod(someMethod(...someMethod("some string", a
>> single application of a method is equal to any number of the applications
>> of the method on the same input.
>>
>> Below I lay out a mechanism by which it is possible to write such methods,
>> but I don’t know the value in writing such methods. I'm merely expressing
>> that idempotency is a possibility.
>>
>> For string "un-escaping", I believe that we can write a method that,
>> indeed, is idempotent by simply running the un-escape method the finite
>> number of un-escapings to get to the point at which the string remains
>> unchanged between applications of the un-escaping method. (I believe that I
>> can write a proof that all un-escape methods have such a point, if that is
>> needed for the sake of discussion).
>>
>> If indeed we can create an idempotent un-escape method, then we can simply
>> take that method run it, and then run the escaping method one time. If we
>> always completely unescape and then escape once then we do have an
>> idempotent method.
>>
>> Such a method might not be all that valuable to the user though.
>> Furthermore, this just explains one way to create such an idempotent
>> method. Whether or not more or more valuable methods exists, would take
>> some more though.
>>
>> Anyone have any thoughts? My feeling is that it might be more effort than
>> it's worth to ensure that any string is only "singly encoded.” Further, we
>> probably should give a look at the “escape_once” methods in
>> StringEsapeUtils.
>>
>> Cheers
>> -Rob
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [text] On the value of idempotent string escape methods?

2017-02-19 Thread Raymond DeCampo
I am trying to see how having the proposed unescape() method leads to an a
useful escape method.

E.g. clearly unescape("") would evaluate to "&".  So would
unescape("amp;").  That means the proposed escape() method would also
have the same output for "" and "amp;".

I think a better approach for an idempotent escape would be to just
unescape the string once, and then run the traditional escape.  You will
still have issues if the user intended to escape the string "" but you
are never going to crack that without some kind of state saving.

Than given that the functionality is available via to consecutive calls to
existing methods, I would probably be disinclined to include it in the
library.


On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins  wrote:

> In preparation for the 1.0 release, I think we should address Sebb's
> concern in TEXT-40 about the attempt to create "idempotent" string escape
> methods. By idempotent I mean someMethod("some string") =
> someMethod(someMethod(someMethod(...someMethod("some string", a
> single application of a method is equal to any number of the applications
> of the method on the same input.
>
> Below I lay out a mechanism by which it is possible to write such methods,
> but I don’t know the value in writing such methods. I'm merely expressing
> that idempotency is a possibility.
>
> For string "un-escaping", I believe that we can write a method that,
> indeed, is idempotent by simply running the un-escape method the finite
> number of un-escapings to get to the point at which the string remains
> unchanged between applications of the un-escaping method. (I believe that I
> can write a proof that all un-escape methods have such a point, if that is
> needed for the sake of discussion).
>
> If indeed we can create an idempotent un-escape method, then we can simply
> take that method run it, and then run the escaping method one time. If we
> always completely unescape and then escape once then we do have an
> idempotent method.
>
> Such a method might not be all that valuable to the user though.
> Furthermore, this just explains one way to create such an idempotent
> method. Whether or not more or more valuable methods exists, would take
> some more though.
>
> Anyone have any thoughts? My feeling is that it might be more effort than
> it's worth to ensure that any string is only "singly encoded.” Further, we
> probably should give a look at the “escape_once” methods in
> StringEsapeUtils.
>
> Cheers
> -Rob
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


[text] On the value of idempotent string escape methods?

2017-02-18 Thread Rob Tompkins
In preparation for the 1.0 release, I think we should address Sebb's concern in 
TEXT-40 about the attempt to create "idempotent" string escape methods. By 
idempotent I mean someMethod("some string") = 
someMethod(someMethod(someMethod(...someMethod("some string", a single 
application of a method is equal to any number of the applications of the 
method on the same input.

Below I lay out a mechanism by which it is possible to write such methods, but 
I don’t know the value in writing such methods. I'm merely expressing that 
idempotency is a possibility.

For string "un-escaping", I believe that we can write a method that, indeed, is 
idempotent by simply running the un-escape method the finite number of 
un-escapings to get to the point at which the string remains unchanged between 
applications of the un-escaping method. (I believe that I can write a proof 
that all un-escape methods have such a point, if that is needed for the sake of 
discussion).

If indeed we can create an idempotent un-escape method, then we can simply take 
that method run it, and then run the escaping method one time. If we always 
completely unescape and then escape once then we do have an idempotent method.

Such a method might not be all that valuable to the user though. Furthermore, 
this just explains one way to create such an idempotent method. Whether or not 
more or more valuable methods exists, would take some more though.

Anyone have any thoughts? My feeling is that it might be more effort than it's 
worth to ensure that any string is only "singly encoded.” Further, we probably 
should give a look at the “escape_once” methods in StringEsapeUtils.

Cheers
-Rob
-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org