Re: JSON.canonicalize()

2018-03-19 Thread Michael J. Ryan
JSON is utf-8 ... As far as 16 but coffee points, there are still astral
character pairs.  Binary data should be enclosed to avoid this, such as
with base-64.

On Fri, Mar 16, 2018, 09:23 Mike Samuel  wrote:

>
>
> On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian 
> wrote:
>
>> Canonical JSON is often used to imply a security property: two JSON blobs
>> with identical contents are expected to have identical canonical JSON forms
>> (and thus identical hashed values).
>>
>
> What does "identical contents" mean in the context of numbers?  JSON
> intentionally avoids specifying any precision for numbers.
>
> JSON.stringify(1/3) === '0.'
>
> What would happen with JSON from systems that allow higher precision?
> I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?
>
>
>
>
>
>> However, unicode normalization allows multiple representations of "the
>> same" string, which defeats this security property.  Depending on your
>> implementation language
>>
>
> We shouldn't normalize unicode in strings that contain packed binary
> data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar
> values and any system that assumes the latter will break often.
>
>
>> and use, a string with precomposed accepts could compare equal to a
>> string with separated accents, even though the canonical JSON or hash
>> differed.  In an extreme case (with a weak hash function, say MD5), this
>> can be used to break security by re-encoding all strings in multiple
>> variants until a collision is found.  This is just a slight variant on the
>> fact that JSON allows multiple ways to encode a character using escape
>> sequences.  You've already taken the trouble to disambiguate this case;
>> security-conscious applications should take care to perform unicode
>> normalization as well, for the same reason.
>>
>> Similarly, if you don't offer a verifier to ensure that the input is in
>> "canonical JSON" format, then an attacker can try to create collisions by
>> violating the rules of canonical JSON format, whether by using different
>> escape sequences, adding whitespace, etc.  This can be used to make JSON
>> which is "the same" appear "different", violating the intent of the
>> canonicalization.  Any security application of canonical JSON will require
>> a strict mode for JSON.parse() as well as a strict mode for
>> JSON.stringify().
>>
>
> Given the dodginess of "identical" w.r.t. non-integral numbers, shouldn't
> endpoints be re-canonicalizing before hashing anyway?  Why would one want
> to ship the canonical form over the wire if it loses precision?
>
>
>
>>   --scott
>>
>> On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <
>> anders.rundgren@gmail.com> wrote:
>>
>>> On 2018-03-16 08:52, C. Scott Ananian wrote:
>>>
 See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
 least
 mention unicode normalization of strings.

>>>
>>> Yes, I could add that unicode normalization of strings is out of scope
>>> for this specification.
>>>
>>>
>>> You probably should also specify a validator: it doesn't matter if you
 emit canonical JSON if you can tweak the hash of the value by feeding
 non-canonical JSON as an input.

>>>
>>> Pardon me, but I don't understand what you are writing here.
>>>
>>> Hash functions only "raison d'être" are providing collision safe
>>> checksums.
>>>
>>> thanx,
>>> Anders
>>>
>>>
>>>--scott

 On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
 anders.rundgren@gmail.com >
 wrote:

 Dear List,

 Here is a proposal that I would be very happy getting feedback on
 since it builds on ES but is not (at all) limited to ES.

 The request is for a complement to the ES "JSON" object called
 canonicalize() which would have identical parameters to the existing
 stringify() method.

>>>
> Why should canonicalize take a replacer?  Hasn't replacement already
> happened?
>
>
>
>> The JSON canonicalization scheme (including ES code for emulating
 it), is described in:

 https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
 <
 https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
 >

 Current workspace:
 https://github.com/cyberphone/json-canonicalization <
 https://github.com/cyberphone/json-canonicalization>

 Thanx,
 Anders Rundgren
 ___
 es-discuss mailing list
 es-discuss@mozilla.org 
 https://mail.mozilla.org/listinfo/es-discuss <
 https://mail.mozilla.org/listinfo/es-discuss>



>>>
>>
>> ___
>> es-discuss mailing list
>> es-discuss@mozilla.org
>> 

Re: JSON.canonicalize()

2018-03-19 Thread Mike Samuel
On Mon, Mar 19, 2018 at 10:30 AM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-19 15:17, Mike Samuel wrote:
>
>>
>>
>> On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <
>> anders.rundgren@gmail.com >
>> wrote:
>>
>> On 2018-03-19 14:34, Mike Samuel wrote:
>>
>> How does the transform you propose differ from?
>>
>> JSON.canonicalize = (x) => JSON.stringify(
>>   x,
>>   (_, x) => {
>> if (x && typeof x === 'object' && !Array.isArray(x)) {
>>   const sorted = {}
>>   for (let key of Object.getOwnPropertyNames(x).sort()) {
>> sorted[key] = x[key]
>>   }
>>   return sorted
>> }
>> return x
>>   })
>>
>>
>> Probably not all.  You are the JS guru, not me :-)
>>
>>
>> The proposal says "in lexical (alphabetical) order."
>> If "lexical order" differs from the lexicographic order that sort
>> uses, then
>> the above could be adjusted to pass a comparator function.
>>
>>
>> I hope (and believe) that this is just a terminology problem.
>>
>>
>> I think you're right. http://www.ecma-international.
>> org/ecma-262/6.0/#sec-sortcompare
>> is where it's specified.  After checking that no custom comparator is
>> present:
>>
>>  1. Let/xString/beToString > .org/ecma-262/6.0/#sec-tostring>(/x/).
>>  2. ReturnIfAbrupt > returnifabrupt>(/xString/).
>>  3. Let/yString/beToString > .org/ecma-262/6.0/#sec-tostring>(/y/).
>>  4. ReturnIfAbrupt > returnifabrupt>(/yString/).
>>  5. If/xString/>  6. If/xString/>/yString/, return 1.
>>  7. Return +0.
>>
>>
>> (<) and (>) do not themselves bring in any locale-specific collation
>> rules.
>> They bottom out on http://www.ecma-international.
>> org/ecma-262/6.0/#sec-abstract-relational-comparison
>>
>> If both/px/and/py/are Strings, then
>>
>>  1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a
>> prefix of String value/q/if/q/can be the result of concatenating/p/and some
>> other String/r/. Note that any String is a prefix of itself, because/r/may
>> be the empty String.)
>>  2. If/px/is a prefix of/py/, return*true*.
>>  3. Let/k/be the smallest nonnegative integer such that the code unit at
>> index/k/within/px/is different from the code unit at index/k/within/py/.
>> (There must be such a/k/, for neither String is a prefix of the other.)
>>  4. Let/m/be the integer that is the code unit value at
>> index/k/within/px/.
>>  5. Let/n/be the integer that is the code unit value at
>> index/k/within/py/.
>>  6. If/m/>
>> Those code unit values are UTF-16 code unit values per
>> http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascri
>> pt-language-types-string-type
>>
>> each element in the String is treated as a UTF-16 code unit value
>>
>> As someone mentioned earlier in this thread, lexicographic string
>> comparisons that use different code
>> unit sizes can compute different results for the same semantic string
>> value.  Between UTF-8 and UTF-32
>> you should see no difference, but UTF-16 can differ from those given
>> supplementary codepoints.
>>
>> It might be worth making explicit that your lexical order is over UTF-16
>> strings if that's what you intend.
>>
>
> Right, it is actually already in 3.2.3:
>

My apologies.  I missed that.

  Property strings to be sorted depend on that strings are represented
>   as arrays of 16-bit unsigned integers where each integer holds a single
>   UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
>   comparisons, independent of locale settings.
>
> This maps "natively" to JS and Java.  Probably to .NET as well.
> Other systems may need a specific comparator.
>

Yep.  Off the top of my head:
Go and Rust use UTF-8.
Python3 is UTF-16, Python2 is usually UTF-16 but may be UTF-32 depending on
sizeof(wchar) when compiling the interpreter.
C++ as is its wont is all of them.



>
>> Applied to your example input,
>>
>> JSON.canonicalize({
>>   "escaping": "\u20ac$\u000F\u000aA'\u0042\u
>> 0022\u005c\\\"\/",
>>   "other":  [null, true, false],
>>   "numbers": [1E30, 4.50, 6, 2e-3,
>> 0.001]
>> }) ===
>> String.raw`{"escaping":"€$\u00
>> 0f\nA'B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"othe
>> r":[null,true,false]}`
>> // proposed {"escaping":"\u20ac$\u000f\nA'
>> B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[nul
>> l,true,false]}
>>
>>
>> The canonicalized example from section 3.2.3 seems to conflict
>> with the text of 3.2.2:
>>
>>
>> If you look a under the result you will find a pretty sad explanation:
>>
>>

Re: JSON.canonicalize()

2018-03-19 Thread Anders Rundgren

On 2018-03-19 15:17, Mike Samuel wrote:



On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren > wrote:

On 2018-03-19 14:34, Mike Samuel wrote:

How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify(
      x,
      (_, x) => {
        if (x && typeof x === 'object' && !Array.isArray(x)) {
          const sorted = {}
          for (let key of Object.getOwnPropertyNames(x).sort()) {
            sorted[key] = x[key]
          }
          return sorted
        }
        return x
      })


Probably not all.  You are the JS guru, not me :-)


The proposal says "in lexical (alphabetical) order."
If "lexical order" differs from the lexicographic order that sort uses, 
then
the above could be adjusted to pass a comparator function.


I hope (and believe) that this is just a terminology problem.


I think you're right. 
http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
is where it's specified.  After checking that no custom comparator is present:

 1. Let/xString/beToString 
(/x/).
 2. ReturnIfAbrupt 
(/xString/).
 3. Let/yString/beToString 
(/y/).
 4. ReturnIfAbrupt 
(/yString/).
 5. If/xString//yString/, return 1.
 7. Return +0.


(<) and (>) do not themselves bring in any locale-specific collation rules.
They bottom out on 
http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both/px/and/py/are Strings, then

 1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a prefix of 
String value/q/if/q/can be the result of concatenating/p/and some other 
String/r/. Note that any String is a prefix of itself, because/r/may be the 
empty String.)
 2. If/px/is a prefix of/py/, return*true*.
 3. Let/k/be the smallest nonnegative integer such that the code unit at 
index/k/within/px/is different from the code unit at index/k/within/py/. (There 
must be such a/k/, for neither String is a prefix of the other.)
 4. Let/m/be the integer that is the code unit value at index/k/within/px/.
 5. Let/n/be the integer that is the code unit value at index/k/within/py/.
 6. If/m/http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string comparisons 
that use different code
unit sizes can compute different results for the same semantic string value.  
Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given 
supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16 
strings if that's what you intend.


Right, it is actually already in 3.2.3:

  Property strings to be sorted depend on that strings are represented
  as arrays of 16-bit unsigned integers where each integer holds a single
  UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
  comparisons, independent of locale settings.

This maps "natively" to JS and Java.  Probably to .NET as well.
Other systems may need a specific comparator.





Applied to your example input,

JSON.canonicalize({
      "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
      "other":  [null, true, false],
      "numbers": [1E30, 4.50, 6, 2e-3, 0.001]
    }) ===
        
String.raw`{"escaping":"€$\u000f\nA'B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
// proposed 
{"escaping":"\u20ac$\u000f\nA'B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


The canonicalized example from section 3.2.3 seems to conflict with the 
text of 3.2.2:


If you look a under the result you will find a pretty sad explanation:

         "Note: \u20ac denotes the Euro character, which not
          being ASCII, is currently not displayable in RFCs"


Cool.

After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates:

https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md 

https://cyberphone.github.io/doc/security/browser-json-canonicalization.html 



If this can be implemented in a small amount of library code, what do you need 
from TC39?


At this stage probably nothing, the BIG issue is the algorithm which I took the 
liberty 

Re: JSON.canonicalize()

2018-03-19 Thread Mike Samuel
On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-19 14:34, Mike Samuel wrote:
>
>> How does the transform you propose differ from?
>>
>> JSON.canonicalize = (x) => JSON.stringify(
>>  x,
>>  (_, x) => {
>>if (x && typeof x === 'object' && !Array.isArray(x)) {
>>  const sorted = {}
>>  for (let key of Object.getOwnPropertyNames(x).sort()) {
>>sorted[key] = x[key]
>>  }
>>  return sorted
>>}
>>return x
>>  })
>>
>
> Probably not all.  You are the JS guru, not me :-)
>
>
>> The proposal says "in lexical (alphabetical) order."
>> If "lexical order" differs from the lexicographic order that sort uses,
>> then
>> the above could be adjusted to pass a comparator function.
>>
>
> I hope (and believe) that this is just a terminology problem.
>

I think you're right.
http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
is where it's specified.  After checking that no custom comparator is
present:

   1. Let *xString* be ToString
   (*x*).
   2. ReturnIfAbrupt
   (
   *xString*).
   3. Let *yString* be ToString
   (*y*).
   4. ReturnIfAbrupt
   (
   *yString*).
   5. If *xString* < *yString*, return −1.
   6. If *xString* > *yString*, return 1.
   7. Return +0.


(<) and (>) do not themselves bring in any locale-specific collation rules.
They bottom out on
http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both *px* and *py* are Strings, then

   1. If *py* is a prefix of *px*, return *false*. (A String value *p* is a
   prefix of String value *q* if *q* can be the result of concatenating *p* and
   some other String *r*. Note that any String is a prefix of itself,
   because *r* may be the empty String.)
   2. If *px* is a prefix of *py*, return *true*.
   3. Let *k* be the smallest nonnegative integer such that the code unit
   at index *k* within *px* is different from the code unit at index *k*
   within *py*. (There must be such a *k*, for neither String is a prefix
   of the other.)
   4. Let *m* be the integer that is the code unit value at index *k* within
*px*.
   5. Let *n* be the integer that is the code unit value at index *k* within
*py*.
   6. If *m* < *n*, return *true*. Otherwise, return *false*.

Those code unit values are UTF-16 code unit values per
http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string
comparisons that use different code
unit sizes can compute different results for the same semantic string
value.  Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given
supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16
strings if that's what you intend.



> Applied to your example input,
>>
>> JSON.canonicalize({
>>  "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
>>  "other":  [null, true, false],
>>  "numbers": [1E30, 4.50, 6, 2e-3, 0.001]
>>}) ===
>>String.raw`{"escaping":"€$\u000f\nA'B\"\"/","numbers":[
>> 1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
>> // proposed {"escaping":"\u20ac$\u000f\nA'B\"\"/","numbers":[1e+30,4
>> .5,6,0.002,1e-27],"other":[null,true,false]}
>>
>>
>> The canonicalized example from section 3.2.3 seems to conflict with the
>> text of 3.2.2:
>>
>
> If you look a under the result you will find a pretty sad explanation:
>
> "Note: \u20ac denotes the Euro character, which not
>  being ASCII, is currently not displayable in RFCs"
>

Cool.


> After 30 years with RFCs, we can still only use ASCII :-( :-(
>
> Updates:
> https://github.com/cyberphone/json-canonicalization/blob/mas
> ter/JSON.canonicalize.md
> https://cyberphone.github.io/doc/security/browser-json-canon
> icalization.html
>

If this can be implemented in a small amount of library code, what do you
need from TC39?



> Anders
>
>
>> """
>> If the Unicode value is outside of the ASCII control character range, it
>> MUST be serialized "as is" unless it is equivalent to 0x005c (\) or
>> 0x0022 (") which MUST be serialized as \\ and \" respectively.
>> """
>>
>> So I think the "\u20ac" should actually be "€" and the implementation
>> above matches your proposal.
>>
>>
>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>> anders.rundgren@gmail.com >
>> wrote:
>>
>> Dear List,
>>
>> Here is a proposal that I would be very happy getting feedback on
>> since it builds on ES but is not 

Re: JSON.canonicalize()

2018-03-19 Thread Anders Rundgren

On 2018-03-19 14:34, Mike Samuel wrote:

How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify(
     x,
     (_, x) => {
       if (x && typeof x === 'object' && !Array.isArray(x)) {
         const sorted = {}
         for (let key of Object.getOwnPropertyNames(x).sort()) {
           sorted[key] = x[key]
         }
         return sorted
       }
       return x
     })


Probably not all.  You are the JS guru, not me :-)



The proposal says "in lexical (alphabetical) order."
If "lexical order" differs from the lexicographic order that sort uses, then
the above could be adjusted to pass a comparator function.


I hope (and believe) that this is just a terminology problem.


Applied to your example input,

JSON.canonicalize({
     "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
     "other":  [null, true, false],
     "numbers": [1E30, 4.50, 6, 2e-3, 0.001]
   }) ===
       
String.raw`{"escaping":"€$\u000f\nA'B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
// proposed 
{"escaping":"\u20ac$\u000f\nA'B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


The canonicalized example from section 3.2.3 seems to conflict with the text of 
3.2.2:


If you look a under the result you will find a pretty sad explanation:

"Note: \u20ac denotes the Euro character, which not
 being ASCII, is currently not displayable in RFCs"

After 30 years with RFCs, we can still only use ASCII :-( :-(

Updates:
https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md
https://cyberphone.github.io/doc/security/browser-json-canonicalization.html

Anders



"""
If the Unicode value is outside of the ASCII control character range, it MUST be serialized 
"as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized 
as \\ and \" respectively.
"""

So I think the "\u20ac" should actually be "€" and the implementation above 
matches your proposal.


On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren > wrote:

Dear List,

Here is a proposal that I would be very happy getting feedback on since it 
builds on ES but is not (at all) limited to ES.

The request is for a complement to the ES "JSON" object called 
canonicalize() which would have identical parameters to the existing stringify() method.

The JSON canonicalization scheme (including ES code for emulating it), is 
described in:

https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
 


Current workspace: https://github.com/cyberphone/json-canonicalization 


Thanx,
Anders Rundgren
___
es-discuss mailing list
es-discuss@mozilla.org 
https://mail.mozilla.org/listinfo/es-discuss 





___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-19 Thread Mike Samuel
How does the transform you propose differ from?

JSON.canonicalize = (x) => JSON.stringify(
x,
(_, x) => {
  if (x && typeof x === 'object' && !Array.isArray(x)) {
const sorted = {}
for (let key of Object.getOwnPropertyNames(x).sort()) {
  sorted[key] = x[key]
}
return sorted
  }
  return x
})


The proposal says "in lexical (alphabetical) order."
If "lexical order" differs from the lexicographic order that sort uses, then
the above could be adjusted to pass a comparator function.

Applied to your example input,

JSON.canonicalize({
"escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
"other":  [null, true, false],
"numbers": [1E30, 4.50, 6, 2e-3, 0.001]
  }) ===

String.raw`{"escaping":"€$\u000f\nA'B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
// proposed
{"escaping":"\u20ac$\u000f\nA'B\"\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


The canonicalized example from section 3.2.3 seems to conflict with the
text of 3.2.2:

"""
If the Unicode value is outside of the ASCII control character range, it
MUST be serialized "as is" unless it is equivalent to 0x005c (\) or
0x0022 (") which MUST be serialized as \\ and \" respectively.
"""

So I think the "\u20ac" should actually be "€" and the implementation above
matches your proposal.


On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> Dear List,
>
> Here is a proposal that I would be very happy getting feedback on since it
> builds on ES but is not (at all) limited to ES.
>
> The request is for a complement to the ES "JSON" object called
> canonicalize() which would have identical parameters to the existing
> stringify() method.
>
> The JSON canonicalization scheme (including ES code for emulating it), is
> described in:
> https://cyberphone.github.io/doc/security/draft-rundgren-jso
> n-canonicalization-scheme.html
>
> Current workspace: https://github.com/cyberphone/json-canonicalization
>
> Thanx,
> Anders Rundgren
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 21:53, Mike Samuel wrote:

For good or for worse, my proposal is indeed about leveraging ES6's take on 
JSON including limitations, {bugs}, and all.
I'm not backing from that position because then things get way more complex 
and probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the 
same thing as changing JSON itself.


Your proposal is limiting Number; my alternative is not extending Number.



Quoting earlier messages from you:

  "Your proposal is less interoperable because you are quoting a SHOULD,
   interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double 
without loss of precision.
   This makes it strictly less interoperable than a proposal that does not have that 
constraint"

  "JSON does not have numeric precision limits.  There are plenty of systems 
that use JSON
   that never involve JavaScript and which pack int64s"

Well, it took a while figuring this out.  No harm done.  Nobody died.

I think we can safely put this thread to rest now; you want to fix a problem that 
was fixed > 10Y+ back through other measures [*].

Thanx,
Anders

*] Cryptography using JSON exchange integers that are 256 bit long and more
   Business system using JSON exchange long decimal numbers
   Scientific systems cramming 80-bit IEEE-754 into "Number" may exist but then 
we are probably talking about research projects using forked/home-grown JSON software

"Number" was never sufficient and will (IMO MUST) remain in its crippled form, 
at least if we stick to mainstream.




___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018, 4:50 PM Anders Rundgren 
wrote:

> On 2018-03-18 20:15, Mike Samuel wrote:
> > I and others have been trying to move towards consensus on what a
> hashable form of
> > JSON should look like.
> >
> > We've identified key areas including
> > * property ordering,
> > * number canonicalization,
> > * string normalization,
> > * whether the input should be a JS value or a string of JSON,
> > * and others
> >
> > but, as in this case, you seem to be arguing both sides of a position to
> support your
> > proposal when you could just say "yes, the proposal could be adjusted
> along this
> > dimension and still provide what's required."
>
> For good or for worse, my proposal is indeed about leveraging ES6's take
> on JSON including limitations, {bugs}, and all.
> I'm not backing from that position because then things get way more
> complex and probably never even happen.
>
> Extending [*] the range of "Number" is pretty much (in practical terms)
> the same thing as changing JSON itself.
>

Your proposal is limiting Number; my alternative is not extending Number.

"Number" is indeed mindless crap but it is what is.
>
> OTOH, the "Number" problem was effectively solved some 10 years ago
> through putting stuff in "strings".
> Using JSON Schema or "Old School" strongly typed programmatic solutions of
> the kind I use, this actually works great.
>
> Anders
>
> *] The RFC gives you the right to do that but existing implementations do
> not.
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 20:15, Mike Samuel wrote:

I and others have been trying to move towards consensus on what a hashable form 
of
JSON should look like.

We've identified key areas including
* property ordering,
* number canonicalization,
* string normalization,
* whether the input should be a JS value or a string of JSON,
* and others

but, as in this case, you seem to be arguing both sides of a position to 
support your
proposal when you could just say "yes, the proposal could be adjusted along this
dimension and still provide what's required."


For good or for worse, my proposal is indeed about leveraging ES6's take on 
JSON including limitations, {bugs}, and all.
I'm not backing from that position because then things get way more complex and 
probably never even happen.

Extending [*] the range of "Number" is pretty much (in practical terms) the 
same thing as changing JSON itself.

"Number" is indeed mindless crap but it is what is.

OTOH, the "Number" problem was effectively solved some 10 years ago through putting stuff 
in "strings".
Using JSON Schema or "Old School" strongly typed programmatic solutions of the 
kind I use, this actually works great.

Anders

*] The RFC gives you the right to do that but existing implementations do not.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 20:23, Mike Samuel wrote:


     F.Y.I: Using ES6 serialization methods for JSON primitive types is 
headed for standardization in the IETF.
https://www.ietf.org/mail-archive/web/jose/current/msg05716.html 
 
>

     This effort is backed by one of the main authors behind the 
current de-facto standard for Signed and Encrypted JSON, aka JOSE.
     If this is in your opinion is a bad idea, now is the right time to 
shoot it down :-)


Does this main author prefer your particular JSON canonicalization 
scheme to
others?


This proposal does [currently] not rely on canonicalization but on ES6 
"predictive parsing and serialization".


Is this an informed opinion based on flaws in the others that make them 
less suitable for
JOSE's needs that are not present in the scheme you back?


A JSON canonicalization scheme has AFAIK never been considered in the 
relevant IETF groups (JOSE+JSON).
On the contrary, it has been dismissed as a daft idea.

I haven't yet submitted my [private] I-D. I'm basically here for collecting 
input and finding possible collaborators.


If so, please provide links to their reasoning.
If not, how is their backing relevant?


If ES6/JSON.stringify() way of serializing JSON primitives becomes an IETF standard 
with backed by Microsoft, it may have an impact on the "market".


If you can't tell us anything concrete about your backers, what they back, or 
why they back it, then why bring it up?


Who they are, What they back, and Why the back it (Rationale), is in the 
referred document above.
Here is a nicer HTML variant of the I-D: 
https://tools.ietf.org/id/draft-erdtman-jose-cleartext-jws-00.html

Anders
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 21:06, Mike Samuel wrote:



On Sun, Mar 18, 2018, 4:00 PM Anders Rundgren > wrote:

On 2018-03-18 20:23, Mike Samuel wrote:
 >     It is possible that I don't understand what you are asking for here 
since I have no experience with toJSON.
 >
 >     Based on this documentation
 > 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
 

 >     JSON.canonicalize() would though work out of the box (when 
integrated in the JSON object NB...) since it would inherit all the functionality 
(and 99% of the code) of JSON.stringify()
 >
 >
 > JSON.stringify(new Date()) has specific semantics because 
Date.prototype.toJSON has specific semantics.
 > As currently written, JSON.canonicalize(new Date()) === 
JSON.canonicalize({})

It seems that you (deliberately?) misunderstand what I'm writing above.

JSON.canonicalize(new Date()) would do exactly the same thing as 
JSON.stringify(new Date()) since it apparently only returns a string.


Where in the spec do you handle this case?


It doesn't, it only describes a canonicalization algorithm.

Integration of the canonicalization algorithm in the ES JSON object might cost 
as much a 5 lines of code + some refactoring.

Anders



Again, the sample code I provided is a bare bones solution with the only 
purpose showing the proposed canonicalization algorithm in code as a complement 
to the written specification.


Understood.  AFAICT neither the text nor the instructional code treat Dates 
differently from an empty object.


Anders



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018, 4:00 PM Anders Rundgren 
wrote:

> On 2018-03-18 20:23, Mike Samuel wrote:
> > It is possible that I don't understand what you are asking for here
> since I have no experience with toJSON.
> >
> > Based on this documentation
> >
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
> <
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
> >
> > JSON.canonicalize() would though work out of the box (when
> integrated in the JSON object NB...) since it would inherit all the
> functionality (and 99% of the code) of JSON.stringify()
> >
> >
> > JSON.stringify(new Date()) has specific semantics because
> Date.prototype.toJSON has specific semantics.
> > As currently written, JSON.canonicalize(new Date()) ===
> JSON.canonicalize({})
>
> It seems that you (deliberately?) misunderstand what I'm writing above.
>
> JSON.canonicalize(new Date()) would do exactly the same thing as
> JSON.stringify(new Date()) since it apparently only returns a string.
>

Where in the spec do you handle this case?

Again, the sample code I provided is a bare bones solution with the only
> purpose showing the proposed canonicalization algorithm in code as a
> complement to the written specification.
>

Understood.  AFAICT neither the text nor the instructional code treat Dates
differently from an empty object.


> Anders
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 20:23, Mike Samuel wrote:

It is possible that I don't understand what you are asking for here since I 
have no experience with toJSON.

Based on this documentation

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
 

JSON.canonicalize() would though work out of the box (when integrated in 
the JSON object NB...) since it would inherit all the functionality (and 99% of 
the code) of JSON.stringify()


JSON.stringify(new Date()) has specific semantics because Date.prototype.toJSON 
has specific semantics.
As currently written, JSON.canonicalize(new Date()) === JSON.canonicalize({})


It seems that you (deliberately?) misunderstand what I'm writing above.

JSON.canonicalize(new Date()) would do exactly the same thing as 
JSON.stringify(new Date()) since it apparently only returns a string.

Again, the sample code I provided is a bare bones solution with the only 
purpose showing the proposed canonicalization algorithm in code as a complement 
to the written specification.

Anders
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018 at 12:50 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-18 15:13, Mike Samuel wrote:
>
>>
>>
>> On Sun, Mar 18, 2018 at 2:14 AM, Anders Rundgren <
>> anders.rundgren@gmail.com >
>> wrote:
>>
>> Hi Guys,
>>
>> Pardon me if you think I was hyperbolic,
>> The discussion got derailed by the bogus claims about hash functions'
>> vulnerability.
>>
>>
>> I didn't say I "think" you were being hyperbolic.  I asked whether you
>> were.
>>
>> You asserted a number that seemed high to me.
>> I demonstrated it was high by a factor of at least 25 by showing an
>> implementation that
>> used 80 lines instead of the 2000 you said was required.
>>
>> If you're going to put out a number as a reason to dismiss an argument,
>> you should own it
>> or retract it.
>> Were you being hyperbolic?  (Y/N)
>>
> N.
> To be completely honest I have only considered fullblown serializers and
> they typically come in the mentioned size.
>
> Your solution have existed a couple days; we may need a little bit more
> time thinking about it :-)
>

Fair enough.


>
> Your claim and my counterclaim are in no way linked to hash function
>> vulnerability.
>> I never weighed in on that claim and have already granted that hashable
>> JSON is a
>> worthwhile use case.
>>
>
> Great!  So we can finally put that argument to rest.
>

No.  I don't disagree with you, but I don't speak for whoever did.



>
>
>> F.Y.I: Using ES6 serialization methods for JSON primitive types is
>> headed for standardization in the IETF.
>> https://www.ietf.org/mail-archive/web/jose/current/msg05716.html <
>> https://www.ietf.org/mail-archive/web/jose/current/msg05716.html>
>>
>> This effort is backed by one of the main authors behind the current
>> de-facto standard for Signed and Encrypted JSON, aka JOSE.
>> If this is in your opinion is a bad idea, now is the right time to
>> shoot it down :-)
>>
>>
>> Does this main author prefer your particular JSON canonicalization scheme
>> to
>> others?
>>
>
> This proposal does [currently] not rely on canonicalization but on ES6
> "predictive parsing and serialization".
>
>
> Is this an informed opinion based on flaws in the others that make them
>> less suitable for
>> JOSE's needs that are not present in the scheme you back?
>>
>
> A JSON canonicalization scheme has AFAIK never been considered in the
> relevant IETF groups (JOSE+JSON).
> On the contrary, it has been dismissed as a daft idea.
>
> I haven't yet submitted my [private] I-D. I'm basically here for
> collecting input and finding possible collaborators.
>
>
>> If so, please provide links to their reasoning.
>> If not, how is their backing relevant?
>>
>
> If ES6/JSON.stringify() way of serializing JSON primitives becomes an IETF
> standard with backed by Microsoft, it may have an impact on the "market".
>

If you can't tell us anything concrete about your backers, what they back,
or why they back it, then why bring it up?



>
>> This efforts also exploits the ability of JSON.parse() and
>> JSON.stringify() honoring object "Creation Order".
>>
>> JSON.canonicalize() would be a "Sorting" alternative to "Creation
>> Order" offering certain advantages with limiting deployment impact to JSON
>> serializers as the most important one.
>>
>> The ["completely broken"] sample code was only submitted as a
>> proof-of-concept. I'm sure you JS gurus can do this way better than I :-)
>>
>>
>> This is a misquote.  No-one has said your sample code was completely
>> broken.
>> Neither your sample code nor the spec deals with toJSON.  At some point
>> you're
>> going to have to address that if you want to keep your proposal moving
>> forward.
>>
>
> It is possible that I don't understand what you are asking for here since
> I have no experience with toJSON.
>
> Based on this documentation
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe
> rence/Global_Objects/JSON/stringify
> JSON.canonicalize() would though work out of the box (when integrated in
> the JSON object NB...) since it would inherit all the functionality (and
> 99% of the code) of JSON.stringify()


JSON.stringify(new Date()) has specific semantics because
Date.prototype.toJSON has specific semantics.
As currently written, JSON.canonicalize(new Date()) ===
JSON.canonicalize({})



>
>
> No amount of JS guru-ry is going to save your sample code from a
>> specification bug.
>>
>>
>> Creating an alternative based on [1,2,3] seems like a rather daunting
>> task.
>>
>>
>> Maybe if you spend more time laying out the criteria on which a
>> successful proposal
>> should be judged, we could move towards consensus on this claim.
>>
>
> Since you have already slashed my proposal there is probably not so much
> consensus to find...
>

I didn't mean to slash anything.

I like parts of your proposal and dislike others.  I talk more about the
bits that I don't like
because that's the purpose of 

Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread C. Scott Ananian
IMO it belongs, at the level of a SHOULD recommendation when the data
represented is intended to be a Unicode string. (But not a MUST because
neither Javascript's 16-bit strings nor the 8-bit JSON representation
necessarily represent Unicode strings.)

But I've said this already.
  --scott

On Sun, Mar 18, 2018, 2:48 PM Anders Rundgren 
wrote:

> On 2018-03-18 19:08, C. Scott Ananian wrote:
> > On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren <
> anders.rundgren@gmail.com >
> wrote:
> >
> > Scott A:
> > https://en.wikipedia.org/wiki/Security_level <
> https://en.wikipedia.org/wiki/Security_level>
> > "For example, SHA-256 offers 128-bit collision resistance"
> > That is, the claims that there are cryptographic issues w.r.t. to
> Unicode Normalization are (fortunately) incorrect.
> > Well, if you actually do normalize Unicode, signatures would indeed
> break, so you don't.
> >
> >
> > Where do you specify SHA-256 signatures in your standard?
> >
> > If one were to use MD5 signatures, they would indeed break in the way I
> describe.
> >
> > It is good security practice to assume that currently-unbroken
> algorithms may eventually break in similar ways to discovered flaws in
> older algorithms.  But in any case, it is simply not good practice to allow
> multiple valid representations of content, if your aim is for a "canonical'
> representation.
>
> Other people could chime in on this since I have already declared my
> position on this topic.  BTW, my proposal comes without cryptographic
> algorithms.
>
> Does Unicode Normalization [naturally] belong to the canonicalization
> issue we are currently discussing?  I didn't see any of that in Richard's
> and Mike's specs. at least.
>
> Anders
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018 at 2:18 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-18 19:04, Mike Samuel wrote:
>
>> I think you misunderstood the criticism.  JSON does not have numeric
>> precision limits.
>>
>
> I think I understood that, yes.
>
> There are plenty of systems that use JSON that never
>> involve JavaScript and which pack int64s.
>>
>
> Sure, but if these systems use the "Number" type they belong to a
> proprietary world where disregarding recommendations and best practices is
> OK.
>

No.  They are simply not following a SHOULD recommendation.
I think you have a variance mismatch in your argument.



> BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?
>

Let's review.

You asserted "This discussion (at least from my point of view), is about
creating stuff that fits into standards."

I agreed and pointed out that not tying the definition to JavaScript's
current value limitations would allow it to fit into
standards that do not assume those limitations.

You leveled this criticism: "My guess is that it would be rejected due to
[quite valid] interoperability concerns."
Implicit in that is when one standard specifies that an input MUST have a
property that conflicts with
an output that a conforming implementation MAY or SHOULD produce then you
have an interoperability concern.


But, you are trying to argue that your proposal is more interoperable
because it works for fewer inputs in fewer contexts
and, if it were ported to other languages, would reject JSON that is
parseable without loss of precision in those languages.
How you can say with a straight face that being non-runtime-agnostic makes
a proposal more interoperable is beyond me.


Here's where variance comes in.
MUST on *output* makes a standard more interoperable.
MAY on *input* makes a standard more interoperable.

SHOULD and SHOULD NOT do not justify denying service.
They are guidelines that should be followed absent a compelling reason --
specific rules trumps the general.


Your proposal is less interoperable because you are quoting a SHOULD,
interpreting it as MUST and saying inputs MUST fit into an IEEE 754 double
without loss of precision.

This makes it strictly less interoperable than a proposal that does not
have that constraint.


EmcaScript SHOULD encourage interoperability since it is often a glue
language.

At the risk of getting meta-,
TC39 SHOULD prefer library functions that provide service for arbitrary
inputs in their range.
TC39 SHOULD prefer library functions that MUST NOT, by virtue of their
semantics,
lose precision silently.


Your proposal fails to be more interoperable inasmuch as it reproduces
JSON.stringify(JSON.parse('1e1000')) === 'null'


There is simply no need to convert a JSON string to JavaScript values in
order to hash it.
There is simply no need to specify this in terms of JavaScript values when
a runtime
agnostic implementation that takes a string and produces a string provides
the same value.


This is all getting very tedious though.
I and others have been trying to move towards consensus on what a hashable
form of
JSON should look like.

We've identified key areas including
* property ordering,
* number canonicalization,
* string normalization,
* whether the input should be a JS value or a string of JSON,
* and others

but, as in this case, you seem to be arguing both sides of a position to
support your
proposal when you could just say "yes, the proposal could be adjusted along
this
dimension and still provide what's required."


If you plan on putting a proposal before TC39 are you willing to move on
any of these.
or are you asking for a YES/NO vote on a proposal that is largely the same
as what
you've presented?


If the former, then acknowledge that there is a range of options and
collect feedback
instead of sticking to "the presently drafted one is good enough."
If the latter, then I vote NO because I think the proposal in its current
form is a poor
solution to the problem.

That's not to say that you've done bad work.
Most non-incremental stage 0 proposals are poor, and the process is
designed to
integrate the ideas of people in different specialties to turn poor
solutions to interesting
problems into robust solutions to a wider range of problems than originally
envisioned.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 19:08, C. Scott Ananian wrote:

On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren > wrote:

Scott A:
https://en.wikipedia.org/wiki/Security_level 

"For example, SHA-256 offers 128-bit collision resistance"
That is, the claims that there are cryptographic issues w.r.t. to Unicode 
Normalization are (fortunately) incorrect.
Well, if you actually do normalize Unicode, signatures would indeed break, 
so you don't.


Where do you specify SHA-256 signatures in your standard?

If one were to use MD5 signatures, they would indeed break in the way I 
describe.

It is good security practice to assume that currently-unbroken algorithms may 
eventually break in similar ways to discovered flaws in older algorithms.  But in 
any case, it is simply not good practice to allow multiple valid representations of 
content, if your aim is for a "canonical' representation.


Other people could chime in on this since I have already declared my position 
on this topic.  BTW, my proposal comes without cryptographic algorithms.

Does Unicode Normalization [naturally] belong to the canonicalization issue we 
are currently discussing?  I didn't see any of that in Richard's and Mike's 
specs. at least.

Anders

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 19:04, Mike Samuel wrote:
I think you misunderstood the criticism.  JSON does not have numeric 
precision limits.  


I think I understood that, yes.


There are plenty of systems that use JSON that never
involve JavaScript and which pack int64s.


Sure, but if these systems use the "Number" type they belong to a proprietary 
world where disregarding recommendations and best practices is OK.

BTW, this an ECMAScript mailing list, why push non-JS complient ideas here?

Anders



On Sun, Mar 18, 2018, 1:55 PM Anders Rundgren > wrote:

On 2018-03-18 18:40, Mike Samuel wrote:
 > A definition of canonical that is not tied to JavaScript's current range 
of values would fit into more standards than the proposal as it stands.
Feel free submitting an Internet-Draft which addresses a more generic 
Number handling.
My guess is that it would be rejected due to [quite valid] interoperability 
concerns.

It would probably fall in the same category as "Fixing JSON" which has not 
happened either.
https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON

Anders

 >
 > On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren  >> wrote:
 >
 >     On 2018-03-18 16:47, Mike Samuel wrote:
 >      > Interop with systems that use 64b ints is not a .001% issue.
 >
 >     Certainly not but using "Number" for dealing with such data would 
never be considered by for example the IETF.
 >
 >     This discussion (at least from my point of view), is about creating 
stuff that fits into standards.
 >
 >     Anders
 >
 >      >
 >      > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren  >        >
 >      >     On 2018-03-18 15:47, Michał Wadas wrote:
 >      >      > JSON supports arbitrary precision numbers that can't be 
properly
 >      >      > represented as 64 bit floats. This includes numbers like 
eg. 1e or 1/1e.
 >      >
 >      >     rfc7159:
 >      >          Since software that implements
 >      >          IEEE 754-2008 binary64 (double precision) numbers 
[IEEE754] is
 >      >          generally available and widely used, good 
interoperability can be
 >      >          achieved by implementations that expect no more 
precision or range
 >      >          than these provide, in the sense that implementations 
will
 >      >          approximate JSON numbers within the expected precision
 >      >
 >      >     If interoperability is not an issue you are free to do 
whatever you feel useful.
 >      >     Targeting a 0.001% customer base with standards, I gladly 
leave to others to cater for.
 >      >
 >      >     The de-facto standard featured in any number of applications, 
is putting unusual/binary/whatever stuff in text strings.
 >      >
 >      >     Anders
 >      >
 >      >      >
 >      >      >
 >      >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren,  
>  
>>  
>  
 wrote:
 >      >      >
 >      >      >     On 2018-03-18 15:08, Richard Gibson wrote:
 >      >      >>     On Sunday, March 18, 2018, Anders Rundgren  
>  
>>  
>  
 wrote:
 >      >      >>
 >      >      >>         On 

Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread C. Scott Ananian
On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> Scott A:
> https://en.wikipedia.org/wiki/Security_level
> "For example, SHA-256 offers 128-bit collision resistance"
> That is, the claims that there are cryptographic issues w.r.t. to Unicode
> Normalization are (fortunately) incorrect.
> Well, if you actually do normalize Unicode, signatures would indeed break,
> so you don't.
>

Where do you specify SHA-256 signatures in your standard?

If one were to use MD5 signatures, they would indeed break in the way I
describe.

It is good security practice to assume that currently-unbroken algorithms
may eventually break in similar ways to discovered flaws in older
algorithms.  But in any case, it is simply not good practice to allow
multiple valid representations of content, if your aim is for a "canonical'
representation.
  --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
I think you misunderstood the criticism.  JSON does not have numeric
precision limits.  There are plenty of systems that use JSON that never
involve JavaScript and which pack int64s.

On Sun, Mar 18, 2018, 1:55 PM Anders Rundgren 
wrote:

> On 2018-03-18 18:40, Mike Samuel wrote:
> > A definition of canonical that is not tied to JavaScript's current range
> of values would fit into more standards than the proposal as it stands.
> Feel free submitting an Internet-Draft which addresses a more generic
> Number handling.
> My guess is that it would be rejected due to [quite valid]
> interoperability concerns.
>
> It would probably fall in the same category as "Fixing JSON" which has not
> happened either.
> https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON
>
> Anders
>
> >
> > On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <
> anders.rundgren@gmail.com >
> wrote:
> >
> > On 2018-03-18 16:47, Mike Samuel wrote:
> >  > Interop with systems that use 64b ints is not a .001% issue.
> >
> > Certainly not but using "Number" for dealing with such data would
> never be considered by for example the IETF.
> >
> > This discussion (at least from my point of view), is about creating
> stuff that fits into standards.
> >
> > Anders
> >
> >  >
> >  > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <
> anders.rundgren@gmail.com 
> >> wrote:
> >  >
> >  > On 2018-03-18 15:47, Michał Wadas wrote:
> >  >  > JSON supports arbitrary precision numbers that can't be
> properly
> >  >  > represented as 64 bit floats. This includes numbers like
> eg. 1e or 1/1e.
> >  >
> >  > rfc7159:
> >  >  Since software that implements
> >  >  IEEE 754-2008 binary64 (double precision) numbers
> [IEEE754] is
> >  >  generally available and widely used, good
> interoperability can be
> >  >  achieved by implementations that expect no more
> precision or range
> >  >  than these provide, in the sense that implementations
> will
> >  >  approximate JSON numbers within the expected precision
> >  >
> >  > If interoperability is not an issue you are free to do
> whatever you feel useful.
> >  > Targeting a 0.001% customer base with standards, I gladly
> leave to others to cater for.
> >  >
> >  > The de-facto standard featured in any number of applications,
> is putting unusual/binary/whatever stuff in text strings.
> >  >
> >  > Anders
> >  >
> >  >  >
> >  >  >
> >  >  > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <
> anders.rundgren@gmail.com 
> >    anders.rundgren@gmail.com  wrote:
> >  >  >
> >  >  > On 2018-03-18 15:08, Richard Gibson wrote:
> >  >  >> On Sunday, March 18, 2018, Anders Rundgren <
> anders.rundgren@gmail.com 
> >    anders.rundgren@gmail.com  wrote:
> >  >  >>
> >  >  >> On 2018-03-16 20:24, Richard Gibson wrote:
> >  >  >>> Though ECMAScript JSON.stringify may suffice for
> certain Javascript-centric use cases or otherwise restricted subsets
> thereof as addressed by JOSE, it is not suitable for producing
> canonical/hashable/etc. JSON, which requires a fully general solution such
> as [1]. Both its number serialization [2] and string serialization [3]
> specify aspects that harm compatibility (the former having arbitrary
> branches dependent upon the value of numbers, the latter being capable of
> producing invalid UTF-8 octet sequences that represent unpaired surrogate
> code points—unacceptable for exchange outside of a closed ecosystem [4]).
> JSON is a general /language-agnostic/interchange format, and ECMAScript
> JSON.stringify is *not*a JSON canonicalization solution.
> >  >  >>>
> >  >  >>> [1]: _
> http://gibson042.github.io/canonicaljson-spec/_
> >  >  >>> [2]:
> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
> >  >  >>> [3]:
> http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
> >  >  >>> [4]:
> https://tools.ietf.org/html/rfc8259#section-8.1
> >  >  >>
> >  >  >> Richard, I may be wrong but AFAICT, our
> respective canoncalization 

Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 18:40, Mike Samuel wrote:

A definition of canonical that is not tied to JavaScript's current range of 
values would fit into more standards than the proposal as it stands.

Feel free submitting an Internet-Draft which addresses a more generic Number 
handling.
My guess is that it would be rejected due to [quite valid] interoperability 
concerns.

It would probably fall in the same category as "Fixing JSON" which has not 
happened either.
https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JSON

Anders



On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren > wrote:

On 2018-03-18 16:47, Mike Samuel wrote:
 > Interop with systems that use 64b ints is not a .001% issue.

Certainly not but using "Number" for dealing with such data would never be 
considered by for example the IETF.

This discussion (at least from my point of view), is about creating stuff 
that fits into standards.

Anders

 >
 > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren  >> wrote:
 >
 >     On 2018-03-18 15:47, Michał Wadas wrote:
 >      > JSON supports arbitrary precision numbers that can't be properly
 >      > represented as 64 bit floats. This includes numbers like eg. 
1e or 1/1e.
 >
 >     rfc7159:
 >          Since software that implements
 >          IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
 >          generally available and widely used, good interoperability can 
be
 >          achieved by implementations that expect no more precision or 
range
 >          than these provide, in the sense that implementations will
 >          approximate JSON numbers within the expected precision
 >
 >     If interoperability is not an issue you are free to do whatever you 
feel useful.
 >     Targeting a 0.001% customer base with standards, I gladly leave to 
others to cater for.
 >
 >     The de-facto standard featured in any number of applications, is 
putting unusual/binary/whatever stuff in text strings.
 >
 >     Anders
 >
 >      >
 >      >
 >      > On Sun, 18 Mar 2018, 15:30 Anders Rundgren,  >        >
 >      >     On 2018-03-18 15:08, Richard Gibson wrote:
 >      >>     On Sunday, March 18, 2018, Anders Rundgren  > 
       >>
 >      >>         On 2018-03-16 20:24, Richard Gibson wrote:
 >      >>>         Though ECMAScript JSON.stringify may suffice for 
certain Javascript-centric use cases or otherwise restricted subsets thereof as addressed 
by JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a 
fully general solution such as [1]. Both its number serialization [2] and string 
serialization [3] specify aspects that harm compatibility (the former having arbitrary 
branches dependent upon the value of numbers, the latter being capable of producing invalid 
UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable for 
exchange outside of a closed ecosystem [4]). JSON is a general 
/language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON 
canonicalization solution.
 >      >>>
 >      >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
 >      >>>         [2]: 
http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
 >      >>>         [3]: 
http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
 >      >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
 >      >>
 >      >>         Richard, I may be wrong but AFAICT, our respective 
canoncalization schemes are in fact principally IDENTICAL.
 >      >>
 >      >>
 >      >>     In that they have the same goal, yes. In that they both 
achieve that goal, no. I'm not married to choices like exponential notation and 
uppercase escapes, but a JSON canonicalization scheme MUST cover all of JSON.
 >      >
 >      >     Here it gets interesting...  What in JSON cannot be expressed 
through JS and JSON.stringify()?
 >      >
 >      >>         That the number 

Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
A definition of canonical that is not tied to JavaScript's current range of
values would fit into more standards than the proposal as it stands.

On Sun, Mar 18, 2018, 12:15 PM Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-18 16:47, Mike Samuel wrote:
> > Interop with systems that use 64b ints is not a .001% issue.
>
> Certainly not but using "Number" for dealing with such data would never be
> considered by for example the IETF.
>
> This discussion (at least from my point of view), is about creating stuff
> that fits into standards.
>
> Anders
>
> >
> > On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <
> anders.rundgren@gmail.com >
> wrote:
> >
> > On 2018-03-18 15:47, Michał Wadas wrote:
> >  > JSON supports arbitrary precision numbers that can't be properly
> >  > represented as 64 bit floats. This includes numbers like eg.
> 1e or 1/1e.
> >
> > rfc7159:
> >  Since software that implements
> >  IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
> >  generally available and widely used, good interoperability can
> be
> >  achieved by implementations that expect no more precision or
> range
> >  than these provide, in the sense that implementations will
> >  approximate JSON numbers within the expected precision
> >
> > If interoperability is not an issue you are free to do whatever you
> feel useful.
> > Targeting a 0.001% customer base with standards, I gladly leave to
> others to cater for.
> >
> > The de-facto standard featured in any number of applications, is
> putting unusual/binary/whatever stuff in text strings.
> >
> > Anders
> >
> >  >
> >  >
> >  > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <
> anders.rundgren@gmail.com 
> >> wrote:
> >  >
> >  > On 2018-03-18 15:08, Richard Gibson wrote:
> >  >> On Sunday, March 18, 2018, Anders Rundgren <
> anders.rundgren@gmail.com 
> >> wrote:
> >  >>
> >  >> On 2018-03-16 20:24, Richard Gibson wrote:
> >  >>> Though ECMAScript JSON.stringify may suffice for
> certain Javascript-centric use cases or otherwise restricted subsets
> thereof as addressed by JOSE, it is not suitable for producing
> canonical/hashable/etc. JSON, which requires a fully general solution such
> as [1]. Both its number serialization [2] and string serialization [3]
> specify aspects that harm compatibility (the former having arbitrary
> branches dependent upon the value of numbers, the latter being capable of
> producing invalid UTF-8 octet sequences that represent unpaired surrogate
> code points—unacceptable for exchange outside of a closed ecosystem [4]).
> JSON is a general /language-agnostic/interchange format, and ECMAScript
> JSON.stringify is *not*a JSON canonicalization solution.
> >  >>>
> >  >>> [1]: _http://gibson042.github.io/canonicaljson-spec/_
> >  >>> [2]:
> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
> >  >>> [3]:
> http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
> >  >>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
> >  >>
> >  >> Richard, I may be wrong but AFAICT, our respective
> canoncalization schemes are in fact principally IDENTICAL.
> >  >>
> >  >>
> >  >> In that they have the same goal, yes. In that they both
> achieve that goal, no. I'm not married to choices like exponential notation
> and uppercase escapes, but a JSON canonicalization scheme MUST cover all of
> JSON.
> >  >
> >  > Here it gets interesting...  What in JSON cannot be expressed
> through JS and JSON.stringify()?
> >  >
> >  >> That the number serialization provided by
> JSON.stringify() is unacceptable, is not generally taken as a fact.  I also
> think it looks a bit weird, but that's just a matter of esthetics.
> Compatibility is an entirely different issue.
> >  >>
> >  >>
> >  >> I concede this point. The modified algorithm is sufficient,
> but note that a canonicalization scheme will remain static even if
> ECMAScript changes.
> >  >
> >  > Agreed.
> >  >
> >  >>
> >  >> Sorting on Unicode Code Points is of course "technically
> 100% right" but strictly put not necessary.
> >  >>
> >  >>
> >  >> Certain scenarios call for different systems to
> _independently_ generate equivalent data structures, and it is a necessary
> property of canonical serialization that it yields identical results for
> equivalent data structures. JSON does not specify significance of object
> member ordering, so member ordering does 

Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 15:13, Mike Samuel wrote:



On Sun, Mar 18, 2018 at 2:14 AM, Anders Rundgren > wrote:

Hi Guys,

Pardon me if you think I was hyperbolic,
The discussion got derailed by the bogus claims about hash functions' 
vulnerability.


I didn't say I "think" you were being hyperbolic.  I asked whether you were.

You asserted a number that seemed high to me.
I demonstrated it was high by a factor of at least 25 by showing an 
implementation that
used 80 lines instead of the 2000 you said was required.

If you're going to put out a number as a reason to dismiss an argument, you 
should own it
or retract it.
Were you being hyperbolic?  (Y/N)

N.
To be completely honest I have only considered fullblown serializers and they 
typically come in the mentioned size.

Your solution have existed a couple days; we may need a little bit more time 
thinking about it :-)



Your claim and my counterclaim are in no way linked to hash function 
vulnerability.
I never weighed in on that claim and have already granted that hashable JSON is 
a
worthwhile use case.


Great!  So we can finally put that argument to rest.




F.Y.I: Using ES6 serialization methods for JSON primitive types is headed 
for standardization in the IETF.
https://www.ietf.org/mail-archive/web/jose/current/msg05716.html 


This effort is backed by one of the main authors behind the current 
de-facto standard for Signed and Encrypted JSON, aka JOSE.
If this is in your opinion is a bad idea, now is the right time to shoot it 
down :-)


Does this main author prefer your particular JSON canonicalization scheme to
others?


This proposal does [currently] not rely on canonicalization but on ES6 "predictive 
parsing and serialization".



Is this an informed opinion based on flaws in the others that make them less 
suitable for
JOSE's needs that are not present in the scheme you back?


A JSON canonicalization scheme has AFAIK never been considered in the relevant 
IETF groups (JOSE+JSON).
On the contrary, it has been dismissed as a daft idea.

I haven't yet submitted my [private] I-D. I'm basically here for collecting 
input and finding possible collaborators.



If so, please provide links to their reasoning.
If not, how is their backing relevant?


If ES6/JSON.stringify() way of serializing JSON primitives becomes an IETF standard with 
backed by Microsoft, it may have an impact on the "market".



This efforts also exploits the ability of JSON.parse() and JSON.stringify() honoring 
object "Creation Order".

JSON.canonicalize() would be a "Sorting" alternative to "Creation Order" 
offering certain advantages with limiting deployment impact to JSON serializers as the most 
important one.

The ["completely broken"] sample code was only submitted as a 
proof-of-concept. I'm sure you JS gurus can do this way better than I :-)


This is a misquote.  No-one has said your sample code was completely broken.
Neither your sample code nor the spec deals with toJSON.  At some point you're
going to have to address that if you want to keep your proposal moving forward.


It is possible that I don't understand what you are asking for here since I 
have no experience with toJSON.

Based on this documentation
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
JSON.canonicalize() would though work out of the box (when integrated in the 
JSON object NB...) since it would inherit all the functionality (and 99% of the 
code) of JSON.stringify()


No amount of JS guru-ry is going to save your sample code from a specification 
bug.


Creating an alternative based on [1,2,3] seems like a rather daunting task.


Maybe if you spend more time laying out the criteria on which a successful 
proposal
should be judged, we could move towards consensus on this claim.


Since you have already slashed my proposal there is probably not so much 
consensus to find...

Anders




As it is, I have only your say so but I have reason to doubt your evaluation
of task complexity unless you were being hyperbolic before.


It is a free world, you may doubt my competence, motives, whatever.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 16:47, Mike Samuel wrote:

Interop with systems that use 64b ints is not a .001% issue.


Certainly not but using "Number" for dealing with such data would never be 
considered by for example the IETF.

This discussion (at least from my point of view), is about creating stuff that 
fits into standards.

Anders



On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren > wrote:

On 2018-03-18 15:47, Michał Wadas wrote:
 > JSON supports arbitrary precision numbers that can't be properly
 > represented as 64 bit floats. This includes numbers like eg. 1e or 
1/1e.

rfc7159:
     Since software that implements
     IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
     generally available and widely used, good interoperability can be
     achieved by implementations that expect no more precision or range
     than these provide, in the sense that implementations will
     approximate JSON numbers within the expected precision

If interoperability is not an issue you are free to do whatever you feel 
useful.
Targeting a 0.001% customer base with standards, I gladly leave to others 
to cater for.

The de-facto standard featured in any number of applications, is putting 
unusual/binary/whatever stuff in text strings.

Anders

 >
 >
 > On Sun, 18 Mar 2018, 15:30 Anders Rundgren,  >> wrote:
 >
 >     On 2018-03-18 15:08, Richard Gibson wrote:
 >>     On Sunday, March 18, 2018, Anders Rundgren  >> wrote:
 >>
 >>         On 2018-03-16 20:24, Richard Gibson wrote:
 >>>         Though ECMAScript JSON.stringify may suffice for certain 
Javascript-centric use cases or otherwise restricted subsets thereof as addressed by 
JOSE, it is not suitable for producing canonical/hashable/etc. JSON, which requires a 
fully general solution such as [1]. Both its number serialization [2] and string 
serialization [3] specify aspects that harm compatibility (the former having arbitrary 
branches dependent upon the value of numbers, the latter being capable of producing 
invalid UTF-8 octet sequences that represent unpaired surrogate code points—unacceptable 
for exchange outside of a closed ecosystem [4]). JSON is a general 
/language-agnostic/interchange format, and ECMAScript JSON.stringify is *not*a JSON 
canonicalization solution.
 >>>
 >>>         [1]: _http://gibson042.github.io/canonicaljson-spec/_
 >>>         [2]: 
http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
 >>>         [3]: 
http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
 >>>         [4]: https://tools.ietf.org/html/rfc8259#section-8.1
 >>
 >>         Richard, I may be wrong but AFAICT, our respective 
canoncalization schemes are in fact principally IDENTICAL.
 >>
 >>
 >>     In that they have the same goal, yes. In that they both achieve 
that goal, no. I'm not married to choices like exponential notation and uppercase 
escapes, but a JSON canonicalization scheme MUST cover all of JSON.
 >
 >     Here it gets interesting...  What in JSON cannot be expressed 
through JS and JSON.stringify()?
 >
 >>         That the number serialization provided by JSON.stringify() is 
unacceptable, is not generally taken as a fact.  I also think it looks a bit weird, 
but that's just a matter of esthetics.  Compatibility is an entirely different issue.
 >>
 >>
 >>     I concede this point. The modified algorithm is sufficient, but 
note that a canonicalization scheme will remain static even if ECMAScript changes.
 >
 >     Agreed.
 >
 >>
 >>         Sorting on Unicode Code Points is of course "technically 100% 
right" but strictly put not necessary.
 >>
 >>
 >>     Certain scenarios call for different systems to _independently_ 
generate equivalent data structures, and it is a necessary property of canonical 
serialization that it yields identical results for equivalent data structures. JSON 
does not specify significance of object member ordering, so member ordering does not 
distinguish otherwise equivalent objects, so canonicalization MUST specify member 
ordering that is deterministic with respect to all valid data.
 >
 >     Violently agree but do not understand (I guess I'm just dumb...) why 
(for example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal 
(although the result would differ).
 >
 >>
 >>         Your claim about uppercase Unicode escapes is incorrect, there 
is no such requirement:
 >>
 >> 

Re: JSON.canonicalize()

2018-03-18 Thread Richard Gibson
On Sun, Mar 18, 2018 at 10:29 AM, Mike Samuel  wrote:

> Does this mean that the language below would need to be fixed at a
> specific version of Unicode or that we would need to cite a specific
> version for
> canonicalization but might allow a higher version for 
> String.prototype.normalize
> and in future versions of the spec require it?
>
> http://www.ecma-international.org/ecma-262/6.0/#sec-conformance
> """
> A conforming implementation of ECMAScript must interpret source text input
> in conformance with the Unicode Standard, Version 5.1.0 or later
> """
>
> and in ECMA 404
> 
>
> """
> For undated references, the latest edition of the referenced document
> (including any amendments) applies. ISO/IEC 10646, Information Technology –
> Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode
> Standard http://www.unicode.org/versions/latest.
> """
>

I can't see why either would have to change. JSON canonicalization should
produce a JSON text in UTF-8, using JSON escape sequences only for double
quote, backslash, and ASCII control characters U+ through U+001F (which
are not valid in JSON strings) and unpaired surrogates U+D800 through
U+DFFF (which are not conforming UTF-8). The algorithm doesn't need to know
whether any given code point has a UCS assignment.

Code points include orphaned surrogates in a way that scalar values do not,
> right?  So both "\uD800" and "\uD800\uDC00" are single codepoints.
> It seems like a strict prefix of a string should still sort before that
> string but prefix transitivity in general does not hold: "\u" <
> "\uD800\uDC00" && "\u" > "\uD800".
> That shouldn't cause problems for hashability but I thought I'd raise it
> just in case.
>

IMO, "\uD800\uDC00" should never be emitted because a proper
canonicalization would be "" (character sequence U+0022 QUOTATION MARK,
U+1 LINEAR B SYLLABLE B008 A, U+0022 QUOTATION MARK; octet sequence
0x22, 0xF0, 0x90, 0x80, 0x80, 0x22).

As for sorting, using the represented code points makes sense to me, but is
not the only option (e.g., another option is using the literal characters
of the JSON text such that "Z" < "\"" < "\\" < "\u" < "\u001F" <
"\uD800" < "\uDC00" < "^" < "x" < "ä" < "가" < "A" < "" < ""). Any
specification of a total deterministic ordering would suffice, it's just
that some are less intuitive than others.

On Sun, Mar 18, 2018 at 10:30 AM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-18 15:08, Richard Gibson wrote:
>
> In that they have the same goal, yes. In that they both achieve that goal,
> no. I'm not married to choices like exponential notation and uppercase
> escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>
> Here it gets interesting...  What in JSON cannot be expressed through JS
> and JSON.stringify()?
>

JSON can express arbitrary numbers, but ECMAScript JSON.stringify is
limited to those with an exact IEEE 754 binary64 representation.

And probably more importantly (though not a gap with respect to JSON
specifically), it emits octet sequences that don't conform to UTF-8 when
serializing unpaired surrogates.

Certain scenarios call for different systems to _independently_ generate
> equivalent data structures, and it is a necessary property of canonical
> serialization that it yields identical results for equivalent data
> structures. JSON does not specify significance of object member ordering,
> so member ordering does not distinguish otherwise equivalent objects, so
> canonicalization MUST specify member ordering that is deterministic with
> respect to all valid data.
>
>
> Violently agree but do not understand (I guess I'm just dumb...) why (for
> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
> (although the result would differ).
>

Any specification of a total deterministic ordering would suffice. Relying
upon 16-bit code units would impose a greater burden on systems that do not
use such representations internally, but is not fundamentally broken.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
Interop with systems that use 64b ints is not a .001% issue.

On Sun, Mar 18, 2018, 11:40 AM Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-18 15:47, Michał Wadas wrote:
> > JSON supports arbitrary precision numbers that can't be properly
> > represented as 64 bit floats. This includes numbers like eg. 1e or
> 1/1e.
>
> rfc7159:
> Since software that implements
> IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
> generally available and widely used, good interoperability can be
> achieved by implementations that expect no more precision or range
> than these provide, in the sense that implementations will
> approximate JSON numbers within the expected precision
>
> If interoperability is not an issue you are free to do whatever you feel
> useful.
> Targeting a 0.001% customer base with standards, I gladly leave to others
> to cater for.
>
> The de-facto standard featured in any number of applications, is putting
> unusual/binary/whatever stuff in text strings.
>
> Anders
>
> >
> >
> > On Sun, 18 Mar 2018, 15:30 Anders Rundgren, <
> anders.rundgren@gmail.com >
> wrote:
> >
> > On 2018-03-18 15:08, Richard Gibson wrote:
> >> On Sunday, March 18, 2018, Anders Rundgren <
> anders.rundgren@gmail.com >
> wrote:
> >>
> >> On 2018-03-16 20:24, Richard Gibson wrote:
> >>> Though ECMAScript JSON.stringify may suffice for certain
> Javascript-centric use cases or otherwise restricted subsets thereof as
> addressed by JOSE, it is not suitable for producing canonical/hashable/etc.
> JSON, which requires a fully general solution such as [1]. Both its number
> serialization [2] and string serialization [3] specify aspects that harm
> compatibility (the former having arbitrary branches dependent upon the
> value of numbers, the latter being capable of producing invalid UTF-8 octet
> sequences that represent unpaired surrogate code points—unacceptable for
> exchange outside of a closed ecosystem [4]). JSON is a general
> /language-agnostic/interchange format, and ECMAScript JSON.stringify is
> *not*a JSON canonicalization solution.
> >>>
> >>> [1]: _http://gibson042.github.io/canonicaljson-spec/_
> >>> [2]:
> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
> >>> [3]:
> http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
> >>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
> >>
> >> Richard, I may be wrong but AFAICT, our respective
> canoncalization schemes are in fact principally IDENTICAL.
> >>
> >>
> >> In that they have the same goal, yes. In that they both achieve
> that goal, no. I'm not married to choices like exponential notation and
> uppercase escapes, but a JSON canonicalization scheme MUST cover all of
> JSON.
> >
> > Here it gets interesting...  What in JSON cannot be expressed
> through JS and JSON.stringify()?
> >
> >> That the number serialization provided by JSON.stringify() is
> unacceptable, is not generally taken as a fact.  I also think it looks a
> bit weird, but that's just a matter of esthetics.  Compatibility is an
> entirely different issue.
> >>
> >>
> >> I concede this point. The modified algorithm is sufficient, but
> note that a canonicalization scheme will remain static even if ECMAScript
> changes.
> >
> > Agreed.
> >
> >>
> >> Sorting on Unicode Code Points is of course "technically 100%
> right" but strictly put not necessary.
> >>
> >>
> >> Certain scenarios call for different systems to _independently_
> generate equivalent data structures, and it is a necessary property of
> canonical serialization that it yields identical results for equivalent
> data structures. JSON does not specify significance of object member
> ordering, so member ordering does not distinguish otherwise equivalent
> objects, so canonicalization MUST specify member ordering that is
> deterministic with respect to all valid data.
> >
> > Violently agree but do not understand (I guess I'm just dumb...) why
> (for example) sorting on UCS2/UTF-16 Code Units would not achieve the same
> goal (although the result would differ).
> >
> >>
> >> Your claim about uppercase Unicode escapes is incorrect, there
> is no such requirement:
> >>
> >> https://tools.ietf.org/html/rfc8259#section-7
> >>
> >> I don't recall ever making a claim about uppercase Unicode escapes,
> other than observing that it is the preferred form for examples in the JSON
> RFCs... what are you talking about?
> >
> > You're right, I found it it in the
> https://gibson042.github.io/canonicaljson-spec/#changelog
> >
> > Thanx,
> > Anders
> >
> > ___
> > es-discuss mailing list
> > es-discuss@mozilla.org 
> > 

Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 15:47, Michał Wadas wrote:
JSON supports arbitrary precision numbers that can't be properly 
represented as 64 bit floats. This includes numbers like eg. 1e or 1/1e.


rfc7159:
   Since software that implements
   IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
   generally available and widely used, good interoperability can be
   achieved by implementations that expect no more precision or range
   than these provide, in the sense that implementations will
   approximate JSON numbers within the expected precision

If interoperability is not an issue you are free to do whatever you feel useful.
Targeting a 0.001% customer base with standards, I gladly leave to others to 
cater for.

The de-facto standard featured in any number of applications, is putting 
unusual/binary/whatever stuff in text strings.

Anders




On Sun, 18 Mar 2018, 15:30 Anders Rundgren, > wrote:

On 2018-03-18 15:08, Richard Gibson wrote:

On Sunday, March 18, 2018, Anders Rundgren > wrote:

On 2018-03-16 20:24, Richard Gibson wrote:

Though ECMAScript JSON.stringify may suffice for certain 
Javascript-centric use cases or otherwise restricted subsets thereof as 
addressed by JOSE, it is not suitable for producing canonical/hashable/etc. 
JSON, which requires a fully general solution such as [1]. Both its number 
serialization [2] and string serialization [3] specify aspects that harm 
compatibility (the former having arbitrary branches dependent upon the value of 
numbers, the latter being capable of producing invalid UTF-8 octet sequences 
that represent unpaired surrogate code points—unacceptable for exchange outside 
of a closed ecosystem [4]). JSON is a general /language-agnostic/interchange 
format, and ECMAScript JSON.stringify is *not*a JSON canonicalization solution.

[1]: _http://gibson042.github.io/canonicaljson-spec/_
[2]: 
http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
[3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
[4]: https://tools.ietf.org/html/rfc8259#section-8.1


Richard, I may be wrong but AFAICT, our respective canoncalization 
schemes are in fact principally IDENTICAL.


In that they have the same goal, yes. In that they both achieve that goal, 
no. I'm not married to choices like exponential notation and uppercase escapes, 
but a JSON canonicalization scheme MUST cover all of JSON.


Here it gets interesting...  What in JSON cannot be expressed through JS 
and JSON.stringify()?


That the number serialization provided by JSON.stringify() is 
unacceptable, is not generally taken as a fact.  I also think it looks a bit 
weird, but that's just a matter of esthetics.  Compatibility is an entirely 
different issue.


I concede this point. The modified algorithm is sufficient, but note that a 
canonicalization scheme will remain static even if ECMAScript changes.


Agreed.



Sorting on Unicode Code Points is of course "technically 100% right" 
but strictly put not necessary.


Certain scenarios call for different systems to _independently_ generate 
equivalent data structures, and it is a necessary property of canonical 
serialization that it yields identical results for equivalent data structures. 
JSON does not specify significance of object member ordering, so member 
ordering does not distinguish otherwise equivalent objects, so canonicalization 
MUST specify member ordering that is deterministic with respect to all valid 
data.


Violently agree but do not understand (I guess I'm just dumb...) why (for 
example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal 
(although the result would differ).



Your claim about uppercase Unicode escapes is incorrect, there is no 
such requirement:

https://tools.ietf.org/html/rfc8259#section-7

I don't recall ever making a claim about uppercase Unicode escapes, other 
than observing that it is the preferred form for examples in the JSON RFCs... 
what are you talking about?


You're right, I found it it in the 
https://gibson042.github.io/canonicaljson-spec/#changelog

Thanx,
Anders

___
es-discuss mailing list
es-discuss@mozilla.org 
https://mail.mozilla.org/listinfo/es-discuss



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018 at 10:47 AM, Michał Wadas 
wrote:

> JSON supports arbitrary precision numbers that can't be properly
> represented as 64 bit floats. This includes numbers like eg. 1e or
> 1/1e.
>

I posted this on the summary thread but not here.

https://gist.github.com/mikesamuel/20710f94a53e440691f04bf79bc3d756 is
structured as a string to string transform, so doesn't lose precision when
round-tripping, e.g. Python bigints and Java BigDecimals.

It also avoids a space explosion for 1e which might help blunt timing
attacks as discussed earlier in this thread.



> On Sun, 18 Mar 2018, 15:30 Anders Rundgren, 
> wrote:
>
>> On 2018-03-18 15:08, Richard Gibson wrote:
>>
>> On Sunday, March 18, 2018, Anders Rundgren 
>> wrote:
>>
>>> On 2018-03-16 20:24, Richard Gibson wrote:
>>>
>>> Though ECMAScript JSON.stringify may suffice for certain
>>> Javascript-centric use cases or otherwise restricted subsets thereof as
>>> addressed by JOSE, it is not suitable for producing
>>> canonical/hashable/etc. JSON, which requires a fully general solution such
>>> as [1]. Both its number serialization [2] and string serialization [3]
>>> specify aspects that harm compatibility (the former having arbitrary
>>> branches dependent upon the value of numbers, the latter being capable of
>>> producing invalid UTF-8 octet sequences that represent unpaired surrogate
>>> code points—unacceptable for exchange outside of a closed ecosystem [4]).
>>> JSON is a general *language-agnostic* interchange format, and
>>> ECMAScript JSON.stringify is *not* a JSON canonicalization solution.
>>>
>>> [1]: *http://gibson042.github.io/canonicaljson-spec/
>>> *
>>> [2]: http://ecma-international.org/ecma-262/7.
>>> 0/#sec-tostring-applied-to-the-number-type
>>> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>>
>>>
>>> Richard, I may be wrong but AFAICT, our respective canoncalization
>>> schemes are in fact principally IDENTICAL.
>>>
>>
>> In that they have the same goal, yes. In that they both achieve that
>> goal, no. I'm not married to choices like exponential notation and
>> uppercase escapes, but a JSON canonicalization scheme MUST cover all of
>> JSON.
>>
>>
>> Here it gets interesting...  What in JSON cannot be expressed through JS
>> and JSON.stringify()?
>>
>>
>>
>>> That the number serialization provided by JSON.stringify() is
>>> unacceptable, is not generally taken as a fact.  I also think it looks a
>>> bit weird, but that's just a matter of esthetics.  Compatibility is an
>>> entirely different issue.
>>>
>>
>> I concede this point. The modified algorithm is sufficient, but note that
>> a canonicalization scheme will remain static even if ECMAScript changes.
>>
>>
>> Agreed.
>>
>>
>> Sorting on Unicode Code Points is of course "technically 100% right" but
>>> strictly put not necessary.
>>>
>>
>> Certain scenarios call for different systems to _independently_ generate
>> equivalent data structures, and it is a necessary property of canonical
>> serialization that it yields identical results for equivalent data
>> structures. JSON does not specify significance of object member ordering,
>> so member ordering does not distinguish otherwise equivalent objects, so
>> canonicalization MUST specify member ordering that is deterministic with
>> respect to all valid data.
>>
>>
>> Violently agree but do not understand (I guess I'm just dumb...) why (for
>> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
>> (although the result would differ).
>>
>>
>> Your claim about uppercase Unicode escapes is incorrect, there is no such
>>> requirement:
>>>
>> https://tools.ietf.org/html/rfc8259#section-7
>>>
>>
>> I don't recall ever making a claim about uppercase Unicode escapes, other
>> than observing that it is the preferred form for examples in the JSON
>> RFCs... what are you talking about?
>>
>>
>> You're right, I found it it in the https://gibson042.github.io/
>> canonicaljson-spec/#changelog
>>
>> Thanx,
>> Anders
>>
>> ___
>> es-discuss mailing list
>> es-discuss@mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018 at 10:43 AM, C. Scott Ananian 
wrote:

> On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren <
> anders.rundgren@gmail.com> wrote:
>
>> Violently agree but do not understand (I guess I'm just dumb...) why (for
>> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
>> (although the result would differ).
>>
>
> Because there are JavaScript strings which do not form valid UTF-16 code
> units.  For example, the one-character string '\uD800'. On the input
> validation side, there are 8-bit strings which can not be decoded as
> UTF-8.  A complete sorting spec needs to describe how these are to be
> handled. For example, something like WTF-8: http://simonsapin.
> github.io/wtf-8/
>

Let's get terminology straight.
"\uD800" is a valid string of UTF-16 code units.   It is also a valid
string of codepoints.  It is not a valid string of scalar values.

http://www.unicode.org/glossary/#code_point : Any value in the Unicode
codespace; that is, the range of integers from 0 to 1016.
http://www.unicode.org/glossary/#code_unit : The minimal bit combination
that can represent a unit of encoded text for processing or interchange.
http://www.unicode.org/glossary/#unicode_scalar_value : Any Unicode *code
point * except high-surrogate
and low-surrogate code points. In other words, the ranges of integers 0 to
D7FF16 and E00016 to 1016 inclusive.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Michał Wadas
JSON supports arbitrary precision numbers that can't be properly
represented as 64 bit floats. This includes numbers like eg. 1e or
1/1e.


On Sun, 18 Mar 2018, 15:30 Anders Rundgren, 
wrote:

> On 2018-03-18 15:08, Richard Gibson wrote:
>
> On Sunday, March 18, 2018, Anders Rundgren 
> wrote:
>
>> On 2018-03-16 20:24, Richard Gibson wrote:
>>
>> Though ECMAScript JSON.stringify may suffice for certain
>> Javascript-centric use cases or otherwise restricted subsets thereof as
>> addressed by JOSE, it is not suitable for producing
>> canonical/hashable/etc. JSON, which requires a fully general solution such
>> as [1]. Both its number serialization [2] and string serialization [3]
>> specify aspects that harm compatibility (the former having arbitrary
>> branches dependent upon the value of numbers, the latter being capable of
>> producing invalid UTF-8 octet sequences that represent unpaired surrogate
>> code points—unacceptable for exchange outside of a closed ecosystem [4]).
>> JSON is a general *language-agnostic* interchange format, and ECMAScript
>> JSON.stringify is *not* a JSON canonicalization solution.
>>
>> [1]: *http://gibson042.github.io/canonicaljson-spec/
>> *
>> [2]:
>> http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type
>> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>
>> Richard, I may be wrong but AFAICT, our respective canoncalization
>> schemes are in fact principally IDENTICAL.
>>
>
> In that they have the same goal, yes. In that they both achieve that goal,
> no. I'm not married to choices like exponential notation and uppercase
> escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>
> Here it gets interesting...  What in JSON cannot be expressed through JS
> and JSON.stringify()?
>
>
>
>> That the number serialization provided by JSON.stringify() is
>> unacceptable, is not generally taken as a fact.  I also think it looks a
>> bit weird, but that's just a matter of esthetics.  Compatibility is an
>> entirely different issue.
>>
>
> I concede this point. The modified algorithm is sufficient, but note that
> a canonicalization scheme will remain static even if ECMAScript changes.
>
>
> Agreed.
>
>
> Sorting on Unicode Code Points is of course "technically 100% right" but
>> strictly put not necessary.
>>
>
> Certain scenarios call for different systems to _independently_ generate
> equivalent data structures, and it is a necessary property of canonical
> serialization that it yields identical results for equivalent data
> structures. JSON does not specify significance of object member ordering,
> so member ordering does not distinguish otherwise equivalent objects, so
> canonicalization MUST specify member ordering that is deterministic with
> respect to all valid data.
>
>
> Violently agree but do not understand (I guess I'm just dumb...) why (for
> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
> (although the result would differ).
>
>
> Your claim about uppercase Unicode escapes is incorrect, there is no such
>> requirement:
>>
> https://tools.ietf.org/html/rfc8259#section-7
>>
>
> I don't recall ever making a claim about uppercase Unicode escapes, other
> than observing that it is the preferred form for examples in the JSON
> RFCs... what are you talking about?
>
>
> You're right, I found it it in the
> https://gibson042.github.io/canonicaljson-spec/#changelog
>
> Thanx,
> Anders
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread C. Scott Ananian
On Sun, Mar 18, 2018, 10:30 AM Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> Violently agree but do not understand (I guess I'm just dumb...) why (for
> example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal
> (although the result would differ).
>

Because there are JavaScript strings which do not form valid UTF-16 code
units.  For example, the one-character string '\uD800'. On the input
validation side, there are 8-bit strings which can not be decoded as
UTF-8.  A complete sorting spec needs to describe how these are to be
handled. For example, something like WTF-8:
http://simonsapin.github.io/wtf-8/
  --scott


>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

On 2018-03-18 15:08, Richard Gibson wrote:

On Sunday, March 18, 2018, Anders Rundgren > wrote:

On 2018-03-16 20:24, Richard Gibson wrote:

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric 
use cases or otherwise restricted subsets thereof as addressed by JOSE, it is 
not suitable for producing canonical/hashable/etc. JSON, which requires a fully 
general solution such as [1]. Both its number serialization [2] and string 
serialization [3] specify aspects that harm compatibility (the former having 
arbitrary branches dependent upon the value of numbers, the latter being 
capable of producing invalid UTF-8 octet sequences that represent unpaired 
surrogate code points—unacceptable for exchange outside of a closed ecosystem 
[4]). JSON is a general /language-agnostic/interchange format, and ECMAScript 
JSON.stringify is *not*a JSON canonicalization solution.

[1]: _http://gibson042.github.io/canonicaljson-spec/ 
_
[2]: 
http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type 

[3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring 

[4]: https://tools.ietf.org/html/rfc8259#section-8.1 



Richard, I may be wrong but AFAICT, our respective canoncalization schemes 
are in fact principally IDENTICAL.


In that they have the same goal, yes. In that they both achieve that goal, no. 
I'm not married to choices like exponential notation and uppercase escapes, but 
a JSON canonicalization scheme MUST cover all of JSON.


Here it gets interesting...  What in JSON cannot be expressed through JS and 
JSON.stringify()?


That the number serialization provided by JSON.stringify() is unacceptable, 
is not generally taken as a fact.  I also think it looks a bit weird, but 
that's just a matter of esthetics.  Compatibility is an entirely different 
issue.


I concede this point. The modified algorithm is sufficient, but note that a 
canonicalization scheme will remain static even if ECMAScript changes.


Agreed.



Sorting on Unicode Code Points is of course "technically 100% right" but 
strictly put not necessary.


Certain scenarios call for different systems to _independently_ generate 
equivalent data structures, and it is a necessary property of canonical 
serialization that it yields identical results for equivalent data structures. 
JSON does not specify significance of object member ordering, so member 
ordering does not distinguish otherwise equivalent objects, so canonicalization 
MUST specify member ordering that is deterministic with respect to all valid 
data.


Violently agree but do not understand (I guess I'm just dumb...) why (for 
example) sorting on UCS2/UTF-16 Code Units would not achieve the same goal 
(although the result would differ).



Your claim about uppercase Unicode escapes is incorrect, there is no such 
requirement:

https://tools.ietf.org/html/rfc8259#section-7 


I don't recall ever making a claim about uppercase Unicode escapes, other than 
observing that it is the preferred form for examples in the JSON RFCs... what 
are you talking about?


You're right, I found it it in the 
https://gibson042.github.io/canonicaljson-spec/#changelog

Thanx,
Anders

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018 at 10:08 AM, Richard Gibson 
wrote:

> On Sunday, March 18, 2018, Anders Rundgren 
> wrote:
>
>> On 2018-03-16 20:24, Richard Gibson wrote:
>>
>> Though ECMAScript JSON.stringify may suffice for certain
>> Javascript-centric use cases or otherwise restricted subsets thereof as
>> addressed by JOSE, it is not suitable for producing
>> canonical/hashable/etc. JSON, which requires a fully general solution such
>> as [1]. Both its number serialization [2] and string serialization [3]
>> specify aspects that harm compatibility (the former having arbitrary
>> branches dependent upon the value of numbers, the latter being capable of
>> producing invalid UTF-8 octet sequences that represent unpaired surrogate
>> code points—unacceptable for exchange outside of a closed ecosystem [4]).
>> JSON is a general *language-agnostic* interchange format, and ECMAScript
>> JSON.stringify is *not* a JSON canonicalization solution.
>>
>> [1]: *http://gibson042.github.io/canonicaljson-spec/
>> *
>> [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostrin
>> g-applied-to-the-number-type
>> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>
>> Richard, I may be wrong but AFAICT, our respective canoncalization
>> schemes are in fact principally IDENTICAL.
>>
>
> In that they have the same goal, yes. In that they both achieve that goal,
> no. I'm not married to choices like exponential notation and uppercase
> escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>
>> That the number serialization provided by JSON.stringify() is
>> unacceptable, is not generally taken as a fact.  I also think it looks a
>> bit weird, but that's just a matter of esthetics.  Compatibility is an
>> entirely different issue.
>>
>
> I concede this point. The modified algorithm is sufficient, but note that
> a canonicalization scheme will remain static even if ECMAScript changes.
>

Does this mean that the language below would need to be fixed at a specific
version of Unicode or that we would need to cite a specific version for
canonicalization but might allow a higher version for
String.prototype.normalize
and in future versions of the spec require it?

http://www.ecma-international.org/ecma-262/6.0/#sec-conformance
"""
A conforming implementation of ECMAScript must interpret source text input
in conformance with the Unicode Standard, Version 5.1.0 or later
"""

and in ECMA 404


"""
For undated references, the latest edition of the referenced document
(including any amendments) applies. ISO/IEC 10646, Information Technology –
Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode
Standard http://www.unicode.org/versions/latest.
"""


Sorting on Unicode Code Points is of course "technically 100% right" but
>> strictly put not necessary.
>>
>
> Certain scenarios call for different systems to _independently_ generate
> equivalent data structures, and it is a necessary property of canonical
> serialization that it yields identical results for equivalent data
> structures. JSON does not specify significance of object member ordering,
> so member ordering does not distinguish otherwise equivalent objects, so
> canonicalization MUST specify member ordering that is deterministic with
> respect to all valid data.
>

Code points include orphaned surrogates in a way that scalar values do not,
right?  So both "\uD800" and "\uD800\uDC00" are single codepoints.
It seems like a strict prefix of a string should still sort before that
string but prefix transitivity in general does not hold: "\u" <
"\uD800\uDC00" && "\u" > "\uD800".
That shouldn't cause problems for hashability but I thought I'd raise it
just in case.



> Your claim about uppercase Unicode escapes is incorrect, there is no such
>> requirement:
>>
> https://tools.ietf.org/html/rfc8259#section-7
>>
>
> I don't recall ever making a claim about uppercase Unicode escapes, other
> than observing that it is the preferred form for examples in the JSON
> RFCs... what are you talking about?
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Mike Samuel
On Sun, Mar 18, 2018 at 2:14 AM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> Hi Guys,
>
> Pardon me if you think I was hyperbolic,
> The discussion got derailed by the bogus claims about hash functions'
> vulnerability.
>

I didn't say I "think" you were being hyperbolic.  I asked whether you were.

You asserted a number that seemed high to me.
I demonstrated it was high by a factor of at least 25 by showing an
implementation that
used 80 lines instead of the 2000 you said was required.

If you're going to put out a number as a reason to dismiss an argument, you
should own it
or retract it.
Were you being hyperbolic?  (Y/N)

Your claim and my counterclaim are in no way linked to hash function
vulnerability.
I never weighed in on that claim and have already granted that hashable
JSON is a
worthwhile use case.



> F.Y.I: Using ES6 serialization methods for JSON primitive types is headed
> for standardization in the IETF.
> https://www.ietf.org/mail-archive/web/jose/current/msg05716.html
>
> This effort is backed by one of the main authors behind the current
> de-facto standard for Signed and Encrypted JSON, aka JOSE.
> If this is in your opinion is a bad idea, now is the right time to shoot
> it down :-)
>

Does this main author prefer your particular JSON canonicalization scheme to
others?
Is this an informed opinion based on flaws in the others that make them
less suitable for
JOSE's needs that are not present in the scheme you back?

If so, please provide links to their reasoning.
If not, how is their backing relevant?



> This efforts also exploits the ability of JSON.parse() and
> JSON.stringify() honoring object "Creation Order".
>
> JSON.canonicalize() would be a "Sorting" alternative to "Creation Order"
> offering certain advantages with limiting deployment impact to JSON
> serializers as the most important one.
>
> The ["completely broken"] sample code was only submitted as a
> proof-of-concept. I'm sure you JS gurus can do this way better than I :-)
>

This is a misquote.  No-one has said your sample code was completely broken.
Neither your sample code nor the spec deals with toJSON.  At some point
you're
going to have to address that if you want to keep your proposal moving
forward.
No amount of JS guru-ry is going to save your sample code from a
specification bug.



> Creating an alternative based on [1,2,3] seems like a rather daunting task.
>

Maybe if you spend more time laying out the criteria on which a successful
proposal
should be judged, we could move towards consensus on this claim.

As it is, I have only your say so but I have reason to doubt your evaluation
of task complexity unless you were being hyperbolic before.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-18 Thread Anders Rundgren

Hi Guys,

Pardon me if you think I was hyperbolic,
The discussion got derailed by the bogus claims about hash functions' 
vulnerability.

F.Y.I: Using ES6 serialization methods for JSON primitive types is headed for 
standardization in the IETF.
https://www.ietf.org/mail-archive/web/jose/current/msg05716.html

This effort is backed by one of the main authors behind the current de-facto 
standard for Signed and Encrypted JSON, aka JOSE.
If this is in your opinion is a bad idea, now is the right time to shoot it 
down :-)

This efforts also exploits the ability of JSON.parse() and JSON.stringify() honoring 
object "Creation Order".

JSON.canonicalize() would be a "Sorting" alternative to "Creation Order" 
offering certain advantages with limiting deployment impact to JSON serializers as the most 
important one.

The ["completely broken"] sample code was only submitted as a proof-of-concept. 
I'm sure you JS gurus can do this way better than I :-)

Creating an alternative based on [1,2,3] seems like a rather daunting task.

Thanx,
Anders
https://github.com/cyberphone/json-canonicalization

1] http://wiki.laptop.org/go/Canonical_JSON
2] https://gibson042.github.io/canonicaljson-spec/
3] https://gist.github.com/mikesamuel/20710f94a53e440691f04bf79bc3d756

On 2018-03-17 22:29, Mike Samuel wrote:



On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren > wrote:

Scott A:
https://en.wikipedia.org/wiki/Security_level 

"For example, SHA-256 offers 128-bit collision resistance"
That is, the claims that there are cryptographic issues w.r.t. to Unicode 
Normalization are (fortunately) incorrect.
Well, if you actually do normalize Unicode, signatures would indeed break, 
so you don't.

Richard G:
Is the [highly involuntary] "inspiration" to the JSON.canonicalize() 
proposal:
https://www.ietf.org/mail-archive/web/json/current/msg04257.html 

Why not fork your go library? Then there would be three implementations!

Mike S:
Wants to build a 2000+ line standalone JSON canonicalizer working on string 
data.
That's great but I think that it will be a hard sell getting these guys 
accept the Pull Request:
https://developers.google.com/v8/ 
JSON.canonicalize(JSON.parse("json string data to be canonicalized")) would 
IMHO do the same job.
My (working) code example was only provided to show the principle as well 
as being able to test/verify.


I don't know where you get the 2000+ line number.
https://gist.github.com/mikesamuel/20710f94a53e440691f04bf79bc3d756 comes in at 
80 lines.
That's roughly twice as long as your demonstrably broken example code, but far 
shorter than the number you provided.

If you're being hyperbolic, please stop.
If that was a genuine guesstimate, but you just happened to be off by a factor 
of 25, then I have less confidence that
you can weigh the design complexity tradeoffs when comparing your's to other 
proposals.


On my part I added canonicalization to my ES6.JSON compliant Java-based 
JSON tools.  A single line did 99% of the job:
https://github.com/cyberphone/openkeystore/blob/jose-compatible/library/src/org/webpki/json/JSONObjectWriter.java#L928  


for (String property : canonicalized ? new 
TreeSet(object.properties.keySet()) : object.properties.keySet()) {


Other mentioned issues like HTML safety, embedded nulls etc. would apply to 
JSON.stringify() as well.
JSON.canonicalize() would inherit all the features (and weaknesses) of 
JSON.stringify().


Please, when you attribute a summary to me, don't ignore the summary that I 
myself wrote of my arguments.

You're ignoring the context.  JSON.canonicalize is not generally useful because 
it undoes safety precautions.
That tied into one argument of mine that you left out: JSON.canonicalize is not 
generally useful.  It should probably not
be used as a wire or storage format, and is entirely unsuitable for embedding 
into other commonly used web application
languages.

You also make no mention of backwards compatibility concerns when this depends 
on things like toJSON, which is hugely important
when dealing with long lived hashes.

When I see that you've summarized my own thoughts incorrectly, even though I 
provided you with a summary of my own arguments,
I lose confidence that you've correctly summarized other's positions.



___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-17 Thread Mike Samuel
On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

>
>
> On my part I added canonicalization to my ES6.JSON compliant Java-based
> JSON tools.  A single line did 99% of the job:
> https://github.com/cyberphone/openkeystore/blob/jose-compati
> ble/library/src/org/webpki/json/JSONObjectWriter.java#L928
>
> for (String property : canonicalized ? new 
> TreeSet(object.properties.keySet())
> : object.properties.keySet()) {
>

If this is what you want then can't you just use a replacer to substitute a
record with sorted keys?

JSON.canonicalize = (value) => JSON.stringify(value, (_, value) => {
  if (value && typeof value === 'object' && !Array.isArray(value)) {
const withSortedKeys = {}
const keys = Object.getOwnPropertyNames(value)
keys.sort()
keys.forEach(key => withSortedKeys[key] = value[key])
value = withSortedKeys
  }
  return value
})
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Summary of Input. Re: JSON.canonicalize()

2018-03-17 Thread Mike Samuel
On Fri, Mar 16, 2018 at 9:42 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> Scott A:
> https://en.wikipedia.org/wiki/Security_level
> "For example, SHA-256 offers 128-bit collision resistance"
> That is, the claims that there are cryptographic issues w.r.t. to Unicode
> Normalization are (fortunately) incorrect.
> Well, if you actually do normalize Unicode, signatures would indeed break,
> so you don't.
>
> Richard G:
> Is the [highly involuntary] "inspiration" to the JSON.canonicalize()
> proposal:
> https://www.ietf.org/mail-archive/web/json/current/msg04257.html
> Why not fork your go library? Then there would be three implementations!
>
> Mike S:
> Wants to build a 2000+ line standalone JSON canonicalizer working on
> string data.
> That's great but I think that it will be a hard sell getting these guys
> accept the Pull Request:
> https://developers.google.com/v8/
> JSON.canonicalize(JSON.parse("json string data to be canonicalized"))
> would IMHO do the same job.
> My (working) code example was only provided to show the principle as well
> as being able to test/verify.
>

I don't know where you get the 2000+ line number.
https://gist.github.com/mikesamuel/20710f94a53e440691f04bf79bc3d756 comes
in at 80 lines.
That's roughly twice as long as your demonstrably broken example code, but
far shorter than the number you provided.

If you're being hyperbolic, please stop.
If that was a genuine guesstimate, but you just happened to be off by a
factor of 25, then I have less confidence that
you can weigh the design complexity tradeoffs when comparing your's to
other proposals.


> On my part I added canonicalization to my ES6.JSON compliant Java-based
> JSON tools.  A single line did 99% of the job:
> https://github.com/cyberphone/openkeystore/blob/jose-compati
> ble/library/src/org/webpki/json/JSONObjectWriter.java#L928

for (String property : canonicalized ? new
TreeSet(object.properties.keySet())
> : object.properties.keySet()) {
>
>
> Other mentioned issues like HTML safety, embedded nulls etc. would apply
> to JSON.stringify() as well.
> JSON.canonicalize() would inherit all the features (and weaknesses) of
> JSON.stringify().
>

Please, when you attribute a summary to me, don't ignore the summary that I
myself wrote of my arguments.

You're ignoring the context.  JSON.canonicalize is not generally useful
because it undoes safety precautions.
That tied into one argument of mine that you left out: JSON.canonicalize is
not generally useful.  It should probably not
be used as a wire or storage format, and is entirely unsuitable for
embedding into other commonly used web application
languages.

You also make no mention of backwards compatibility concerns when this
depends on things like toJSON, which is hugely important
when dealing with long lived hashes.

When I see that you've summarized my own thoughts incorrectly, even though
I provided you with a summary of my own arguments,
I lose confidence that you've correctly summarized other's positions.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Hashable vs Canonicalizable. Re: JSON.canonicalize()

2018-03-17 Thread Anders Rundgren

A "Hashable" format does not have to comply with the original; the only 
requirement is that it is reproducible.
However, I have difficulties coming up with a good argument for not sticking to 
the original.
If you stick to the original, then the terms Hashable and Canonicalizable 
become fully interchangeable.

I could though imagine representing "Number" as IEEE-754 8-byte binary blobs 
instead of a textual format but the availability of a useful definition and 
implementation in ES6, makes this less appetizing.

Note that the availability of canonicalization DOES NOT mean that you MUST use it as the 
"wire format".

In my own applications [*], I do not intend to use "JSON.canonicalize()" except 
internally for crypto related operations.
Why is that?  Because it breaks the "natural order" provided by 
JSON.stringify().

thanx,
Anders

*] https://cyberphone.github.io/doc/saturn/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Browser version on-line. Re: JSON.canonicalize()

2018-03-17 Thread Anders Rundgren

F.Y.I.

https://cyberphone.github.io/doc/security/browser-json-canonicalization.html

thanx,
Anders
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-17 Thread Isiah Meadows
With files frequently that size, it might be worth considering whether
you should use a custom format+validator\* instead. It'd take a lot
less memory, which could be helpful since the first row alone of [this
file][1] takes about 4-5K in Firefox when deserialized - I verified
this in the console (To be exact, 5032 the first time, 4128 the
second, and 4416 the third). Also, a megabyte is a *lot* to send down
the wire in Web terms.

\* In this case, you'd need a validator that uses minimal perfect
hashes and a compact binary data representation that doesn't rely on a
concrete start/end. That would avoid the mess of constantly having to
look things up in memory, while leaving your IR much smaller. Another
item of note: JS strings are 16-bit, which is wasteful in memory for
your entire object.

[1]: 
https://raw.githubusercontent.com/kaizhu256/node-swgg-github-all/2018.2.2/assets.swgg.swagger.json

-

Isiah Meadows
m...@isiahmeadows.com

Looking for web consulting? Or a new website?
Send me an email and we can get started.
www.isiahmeadows.com


On Fri, Mar 16, 2018 at 11:53 PM, kai zhu  wrote:
> stepping aside from the security aspect, having your code-base’s json-files
> normalized with sorted-keys is good-housekeeping, especially when you want
> to sanely maintain ones >1mb in size (e.g. large swagger
> json-documentations) [1].
>
> and you can easily operationalize your build-process / pre-commit-checks to
> auto-key-sort json-files with the following simple shell-function [2].
>
> [1]
> https://github.com/kaizhu256/node-swgg-github-all/blob/2018.2.2/assets.swgg.swagger.json
> [2]
> https://github.com/kaizhu256/node-utility2/blob/2018.1.13/lib.utility2.sh#L1513
>
>
>
> ```shell
> #!/bin/sh
> # .bashrc
> : '
> # to install, copy-paste the shell-function shFileJsonNormalize below
> # into your shell startup script (.bashrc, .profile, etc...)
>
>
> # example shell-usage:
>
> source ~/.bashrc
> printf "{
> \"version\": \"0.0.1\",
> \"name\": \"my-app\",
> \"aa\": {
> \"zz\": 1,
> \"yy\": {
> \"xx\": 2,
> \"ww\": 3
> }
> },
> \"bb\": [
> 3,
> 2,
> 1,
> null
> ]
> }" > package.json
> shFileJsonNormalize package.json
> cat package.json
>
>
> # key-sorted output:
> {
> "aa": {
> "yy": {
> "ww": 3,
> "xx": 2
> },
> "zz": 1
> },
> "bb": [
> 3,
> 2,
> 1,
> null
> ],
> "name": "my-app",
> "version": "0.0.1"
> }
> '
>
>
> shFileJsonNormalize() {(set -e
> # this shell-function will
> # 1. read the json-data from $FILE
> # 2. normalize the json-data
> # 3. write the normalized json-data back to $FILE
> FILE="$1"
> node -e "
> // 
> /*jslint
> bitwise: true,
> browser: true,
> maxerr: 8,
> maxlen: 100,
> node: true,
> nomen: true,
> regexp: true,
> stupid: true
> */
> 'use strict';
> var local;
> local = {};
> local.fs = require('fs');
> local.jsonStringifyOrdered = function (jsonObj, replacer, space) {
> /*
>  * this function will JSON.stringify the jsonObj,
>  * with object-keys sorted and circular-references removed
>  */
> var circularList, stringify, tmp;
> stringify = function (jsonObj) {
> /*
>  * this function will recursively JSON.stringify the jsonObj,
>  * with object-keys sorted and circular-references removed
>  */
> // if jsonObj is an object, then recurse its items with object-keys
> sorted
> if (jsonObj &&
> typeof jsonObj === 'object' &&
> typeof jsonObj.toJSON !== 'function') {
> // ignore circular-reference
> if (circularList.indexOf(jsonObj) >= 0) {
> return;
> }
> circularList.push(jsonObj);
> // if jsonObj is an array, then recurse its jsonObjs
> if (Array.isArray(jsonObj)) {
> return '[' + jsonObj.map(function (jsonObj) {
> // recurse
> tmp = stringify(jsonObj);
> return typeof tmp === 'string'
> ? tmp
> : 'null';
> }).join(',') + ']';
> }
> return '{' + Object.keys(jsonObj)
> // sort object-keys
> .sort()
> .map(function (key) {
> // recurse
> tmp = stringify(jsonObj[key]);
> if (typeof tmp === 'string') {
> return JSON.stringify(key) + ':' + tmp;
> }
> })
> .filter(function (jsonObj) {
> return typeof jsonObj === 'string';
> })
> .join(',') + '}';
> }
> // else JSON.stringify as normal
> return JSON.stringify(jsonObj);
> };
> circularList = [];
>  

Summary of Input. Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

Scott A:
https://en.wikipedia.org/wiki/Security_level
"For example, SHA-256 offers 128-bit collision resistance"
That is, the claims that there are cryptographic issues w.r.t. to Unicode 
Normalization are (fortunately) incorrect.
Well, if you actually do normalize Unicode, signatures would indeed break, so 
you don't.

Richard G:
Is the [highly involuntary] "inspiration" to the JSON.canonicalize() proposal:
https://www.ietf.org/mail-archive/web/json/current/msg04257.html
Why not fork your go library? Then there would be three implementations!

Mike S:
Wants to build a 2000+ line standalone JSON canonicalizer working on string 
data.
That's great but I think that it will be a hard sell getting these guys accept 
the Pull Request:
https://developers.google.com/v8/
JSON.canonicalize(JSON.parse("json string data to be canonicalized")) would 
IMHO do the same job.
My (working) code example was only provided to show the principle as well as 
being able to test/verify.


On my part I added canonicalization to my ES6.JSON compliant Java-based JSON 
tools.  A single line did 99% of the job:
https://github.com/cyberphone/openkeystore/blob/jose-compatible/library/src/org/webpki/json/JSONObjectWriter.java#L928

for (String property : canonicalized ? new 
TreeSet(object.properties.keySet()) : object.properties.keySet()) {


Other mentioned issues like HTML safety, embedded nulls etc. would apply to 
JSON.stringify() as well.
JSON.canonicalize() would inherit all the features (and weaknesses) of 
JSON.stringify().


thanx,
Anders
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018, 4:58 PM Anders Rundgren 
wrote:

> On 2018-03-16 21:41, Mike Samuel wrote:
> >
> >
> > On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian  > wrote:
> >
> > On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <
> anders.rundgren@gmail.com >
> wrote:
> >
>
> > To restate my main objections:
> >
> > I think any proposal to offer an alternative stringify instead of a
> string->string transform is not very good
> > and could be easily improved by rephrasing it as a string->string
> transform.
>
> Could you give a concrete example on that?
>
>
>
I've given three.  As written, the proposal produces invalid or low quality
output given (undefined, objects with toJSON methods, and symbols as either
keys or values).  These would not be problems for a real canonicalizer
since none are present in a string of JSON.

In addition, two distant users of the canonicalizer who wish to check
hashes need to agree on the ancillary arguments like the replacer if
canonicalize takes the same arguments and actually uses them.  They also
need to agree on implementation details of toJSON methods which is a
backward compatibility hazard.

If you did solve the toJSON problem by incorporating calls to that method
you've now complicated cross-platform behavior.  If you phrase in terms of
string->string it is much easier to disentangle the definition of
canonicalizers JSON from JS and make it language agnostic.

Finally, your proposal is not the VHS of canonicalizers.  That would be
x=>JSON.stringify(JSON.parse(x)) since it's deployed and used.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
My main feedback is that since this topic has been covered so many times in
the past, any serious standardization proposal should include a section
surveying existing "canonical JSON" standards and implementations and
comparing the proposed standard with prior work.  A standard should be a
"best of breed" implementation, which adequately replaces existing work,
not just another average implementation narrowly tailored to the proposer's
own particular use cases.

I don't think Unicode Normalization should necessarily be a requirement of
a canonical JSON standard.  But any reasonable proposal should at least
acknowledge the issues raised, as well as the issues of embedded nulls,
HTML safety, and the other points that have been raised in this thread (and
the many other points addressed by the dozen other "canonical JSON"
implementations I linked to).  If you're just going to say, "my proposal is
good enough", well then mine is "good enough" too, and so are the other
dozen, and none of them need to be the "official JavaScript canonical
form".  What's your compelling argument that your proposal is better than
any of the other dozen?  And why start the discussion on this list if
you're not going to do anything with the information you learn?
 --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mathias Bynens
On Fri, Mar 16, 2018 at 9:04 PM, Mike Samuel  wrote:

>
> The output of JSON.canonicalize would also not be in the subset of JSON
> that is also a subset of JavaScript's PrimaryExpression.
>
>JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
>

Soon U+2028 and U+2029 will no longer be edge cases. A Stage 3 proposal
(currently shipping in Chrome) makes them valid in ECMAScript string
literals, making JSON a strict subset of ECMAScript:
https://github.com/tc39/proposal-json-superset
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 21:41, Mike Samuel wrote:



On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian > wrote:

On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren > wrote:

Perfection is often the enemy of good.


So, to be clear: you don't plan on actually incorporating any feedback into your 
proposal, since it's already "good"?


I'm not going to incorporate Unicode Normalization because it is better 
addressed at the application level.



To restate my main objections:

I think any proposal to offer an alternative stringify instead of a 
string->string transform is not very good
and could be easily improved by rephrasing it as a string->string transform.


Could you give a concrete example on that?


Also, presenting this as a better wire format I think is misleading 


This was not my intention, I just expressed it poorly.  It was rather mixed 
with my objection to Unicode Normalization.



since I think it has no advantages as a wire format over JSON.stringify's 
output,


Right, JSON.stringify() is a much better for creating the external format since it honors 
"creation order".



and recommending canonical JSON, except for the short duration needed to hash 
it creates more problems than it solves.


Wrong, this is exactly what I had in mind.  If the hashable/canonicalizable 
method works as described (it does not?) it solves the hashing problem.

Anders
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 4:34 PM, C. Scott Ananian 
wrote:

> On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <
> anders.rundgren@gmail.com> wrote:
>
>> Perfection is often the enemy of good.
>>
>
> So, to be clear: you don't plan on actually incorporating any feedback
> into your proposal, since it's already "good"?
>

To restate my main objections:

I think any proposal to offer an alternative stringify instead of a
string->string transform is not very good
and could be easily improved by rephrasing it as a string->string transform.

Also, presenting this as a better wire format I think is misleading since I
think it has no advantages as a wire format over JSON.stringify's
output, and recommending canonical JSON, except for the short duration
needed to hash it creates more problems than it solves.


>   --scott
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
On Fri, Mar 16, 2018 at 4:07 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> Perfection is often the enemy of good.
>

So, to be clear: you don't plan on actually incorporating any feedback into
your proposal, since it's already "good"?
  --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 20:24, Richard Gibson wrote:

Though ECMAScript JSON.stringify may suffice for certain Javascript-centric use 
cases or otherwise restricted subsets thereof as addressed by JOSE, it is not 
suitable for producing canonical/hashable/etc. JSON, which requires a fully 
general solution such as [1]. Both its number serialization [2] and string 
serialization [3] specify aspects that harm compatibility (the former having 
arbitrary branches dependent upon the value of numbers, the latter being 
capable of producing invalid UTF-8 octet sequences that represent unpaired 
surrogate code points—unacceptable for exchange outside of a closed ecosystem 
[4]). JSON is a general /language-agnostic/interchange format, and ECMAScript 
JSON.stringify is *not*a JSON canonicalization solution.


It effectively depends on your objectives.

#2 is not really a problem, you would typically not output canonicalized JSON, 
it is only used internally since there are no requirements that input is 
canonicalized .
#3 yes, if you create bad data you can [always] screw up.  It sounds BTW as a 
bug which presumable get fixed some day.
#4 If you are targeting Node.js, Browsers, OpenAPI, and all other platforms 
compatible with those, JSON.stringify() seems to suffice.

The JSON.canonicalize() method proposal was intended for the systems specified 
in #4.

Perfection is often the enemy of good.

Anders



[1]: _http://gibson042.github.io/canonicaljson-spec/_
[2]: 
http://ecma-international.org/ecma-262/7.0/#sec-tostring-applied-to-the-number-type 

[3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring 

[4]: https://tools.ietf.org/html/rfc8259#section-8.1 


On Fri, Mar 16, 2018 at 3:09 PM, Mike Samuel > wrote:



On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren > wrote:

On 2018-03-16 19:51, Mike Samuel wrote:



On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren  >> wrote:

    On 2018-03-16 19:30, Mike Samuel wrote:

        2. Any numbers with minimal changes: dropping + signs, 
normalizing zeros,
              using a fixed threshold for scientific notation.
              PROS: supports whole JSON value-space
              CONS: less useful for hashing
              CONS: risks loss of precision when decoders decide 
based on presence of
                 decimal point whether to represent as double or 
int.


    Have you actually looked into the specification?

https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2
 

 
>
    ES6 has all what it takes.


Yes, but other notions of canonical equivalence have been mentioned 
here
so reasons to prefer one to another seem in scope.


Availability beats perfection anytime.  This is the VHS (if anybody 
remember that old story) of canonicalization and I don't feel too bad about 
that :-)


Perhaps.  Any thoughts on my question about the merits of "Hashable" vs 
"Canonical"?

___
es-discuss mailing list
es-discuss@mozilla.org 
https://mail.mozilla.org/listinfo/es-discuss 





___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 3:23 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-16 20:09, Mike Samuel wrote:
>
>>
>> Availability beats perfection anytime.  This is the VHS (if anybody
>> remember that old story) of canonicalization and I don't feel too bad about
>> that :-)
>>
>>
>> Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
>> "Canonical"?
>>
>
> No, there were so much noise here so I may have need a more dense
> description if possible.
>

In the email to which you responded "Have you actually looked ..." look for
"If that is correct, Would people be averse to marketing this as "hashable
JSON" instead of "canonical JSON?""
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Richard Gibson
Though ECMAScript JSON.stringify may suffice for certain Javascript-centric
use cases or otherwise restricted subsets thereof as addressed by JOSE, it
is not suitable for producing canonical/hashable/etc. JSON, which requires
a fully general solution such as [1]. Both its number serialization [2] and
string serialization [3] specify aspects that harm compatibility (the
former having arbitrary branches dependent upon the value of numbers, the
latter being capable of producing invalid UTF-8 octet sequences that
represent unpaired surrogate code points—unacceptable for exchange outside
of a closed ecosystem [4]). JSON is a general *language-agnostic*
interchange format, and ECMAScript JSON.stringify is *not* a JSON
canonicalization solution.

[1]: *http://gibson042.github.io/canonicaljson-spec/
*
[2]: http://ecma-international.org/ecma-262/7.0/#sec-
tostring-applied-to-the-number-type
[3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
[4]: https://tools.ietf.org/html/rfc8259#section-8.1

On Fri, Mar 16, 2018 at 3:09 PM, Mike Samuel  wrote:

>
>
> On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <
> anders.rundgren@gmail.com> wrote:
>
>> On 2018-03-16 19:51, Mike Samuel wrote:
>>
>>>
>>>
>>> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
>>> anders.rundgren@gmail.com >
>>> wrote:
>>>
>>> On 2018-03-16 19:30, Mike Samuel wrote:
>>>
>>> 2. Any numbers with minimal changes: dropping + signs,
>>> normalizing zeros,
>>>   using a fixed threshold for scientific notation.
>>>   PROS: supports whole JSON value-space
>>>   CONS: less useful for hashing
>>>   CONS: risks loss of precision when decoders decide based
>>> on presence of
>>>  decimal point whether to represent as double or int.
>>>
>>>
>>> Have you actually looked into the specification?
>>> https://cyberphone.github.io/doc/security/draft-rundgren-jso
>>> n-canonicalization-scheme.html#rfc.section.3.2.2 <
>>> https://cyberphone.github.io/doc/security/draft-rundgren-js
>>> on-canonicalization-scheme.html#rfc.section.3.2.2>
>>> ES6 has all what it takes.
>>>
>>>
>>> Yes, but other notions of canonical equivalence have been mentioned here
>>> so reasons to prefer one to another seem in scope.
>>>
>>
>> Availability beats perfection anytime.  This is the VHS (if anybody
>> remember that old story) of canonicalization and I don't feel too bad about
>> that :-)
>
>
> Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
> "Canonical"?
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 20:09, Mike Samuel wrote:


Availability beats perfection anytime.  This is the VHS (if anybody 
remember that old story) of canonicalization and I don't feel too bad about 
that :-)


Perhaps.  Any thoughts on my question about the merits of "Hashable" vs 
"Canonical"?


No, there were so much noise here so I may have need a more dense description 
if possible.

Anders

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
I think the horse is out of the barn re hashable-vs-canonical.  It has
(independently) been invented and named canonical JSON many many times,
starting 11 years ago.

https://gibson042.github.io/canonicaljson-spec/
https://www.npmjs.com/package/another-json
https://www.npmjs.com/package/canonical-json
https://www.npmjs.com/package/keyify
https://www.npmjs.com/package/canonical-tent-json
https://www.npmjs.com/package/content-addressable-json
https://godoc.org/github.com/gibson042/canonicaljson-go
https://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00
https://keybase.io/docs/api/1.0/canonical_packings#json
https://tools.ietf.org/html/rfc7638#section-3.3
http://wiki.laptop.org/go/Canonical_JSON
https://github.com/mirkokiefer/canonical-json
https://github.com/davidchambers/CANON

"Content Addressable JSON" is a variant of your "hashable JSON" proposal,
though.  But the "canonicals" seem to vastly outnumber the "hashables".

My question for Anders is: do you actually plan to incorporate any feedback
into changes to your proposal?  Or were you really just looking for us to
validate your work, not actually contribute to it?
 --scott

On Fri, Mar 16, 2018 at 3:09 PM, Mike Samuel  wrote:

>
>
> On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <
> anders.rundgren@gmail.com> wrote:
>
>> On 2018-03-16 19:51, Mike Samuel wrote:
>>
>>>
>>>
>>> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
>>> anders.rundgren@gmail.com >
>>> wrote:
>>>
>>> On 2018-03-16 19:30, Mike Samuel wrote:
>>>
>>> 2. Any numbers with minimal changes: dropping + signs,
>>> normalizing zeros,
>>>   using a fixed threshold for scientific notation.
>>>   PROS: supports whole JSON value-space
>>>   CONS: less useful for hashing
>>>   CONS: risks loss of precision when decoders decide based
>>> on presence of
>>>  decimal point whether to represent as double or int.
>>>
>>>
>>> Have you actually looked into the specification?
>>> https://cyberphone.github.io/doc/security/draft-rundgren-jso
>>> n-canonicalization-scheme.html#rfc.section.3.2.2 <
>>> https://cyberphone.github.io/doc/security/draft-rundgren-js
>>> on-canonicalization-scheme.html#rfc.section.3.2.2>
>>> ES6 has all what it takes.
>>>
>>>
>>> Yes, but other notions of canonical equivalence have been mentioned here
>>> so reasons to prefer one to another seem in scope.
>>>
>>
>> Availability beats perfection anytime.  This is the VHS (if anybody
>> remember that old story) of canonicalization and I don't feel too bad about
>> that :-)
>
>
> Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
> "Canonical"?
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 3:03 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-16 19:51, Mike Samuel wrote:
>
>>
>>
>> On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
>> anders.rundgren@gmail.com >
>> wrote:
>>
>> On 2018-03-16 19:30, Mike Samuel wrote:
>>
>> 2. Any numbers with minimal changes: dropping + signs,
>> normalizing zeros,
>>   using a fixed threshold for scientific notation.
>>   PROS: supports whole JSON value-space
>>   CONS: less useful for hashing
>>   CONS: risks loss of precision when decoders decide based on
>> presence of
>>  decimal point whether to represent as double or int.
>>
>>
>> Have you actually looked into the specification?
>> https://cyberphone.github.io/doc/security/draft-rundgren-jso
>> n-canonicalization-scheme.html#rfc.section.3.2.2 <
>> https://cyberphone.github.io/doc/security/draft-rundgren-js
>> on-canonicalization-scheme.html#rfc.section.3.2.2>
>> ES6 has all what it takes.
>>
>>
>> Yes, but other notions of canonical equivalence have been mentioned here
>> so reasons to prefer one to another seem in scope.
>>
>
> Availability beats perfection anytime.  This is the VHS (if anybody
> remember that old story) of canonicalization and I don't feel too bad about
> that :-)


Perhaps.  Any thoughts on my question about the merits of "Hashable" vs
"Canonical"?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 19:51, Mike Samuel wrote:



On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren > wrote:

On 2018-03-16 19:30, Mike Samuel wrote:

2. Any numbers with minimal changes: dropping + signs, normalizing 
zeros,
      using a fixed threshold for scientific notation.
      PROS: supports whole JSON value-space
      CONS: less useful for hashing
      CONS: risks loss of precision when decoders decide based on 
presence of
         decimal point whether to represent as double or int.


Have you actually looked into the specification?

https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2
 

ES6 has all what it takes.


Yes, but other notions of canonical equivalence have been mentioned here
so reasons to prefer one to another seem in scope.


Availability beats perfection anytime.  This is the VHS (if anybody remember 
that old story) of canonicalization and I don't feel too bad about that :-)

Anders

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 2:43 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-16 19:30, Mike Samuel wrote:
>
>> 2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
>>  using a fixed threshold for scientific notation.
>>  PROS: supports whole JSON value-space
>>  CONS: less useful for hashing
>>  CONS: risks loss of precision when decoders decide based on presence
>> of
>> decimal point whether to represent as double or int.
>>
>
> Have you actually looked into the specification?
> https://cyberphone.github.io/doc/security/draft-rundgren-jso
> n-canonicalization-scheme.html#rfc.section.3.2.2
> ES6 has all what it takes.
>

Yes, but other notions of canonical equivalence have been mentioned here
so reasons to prefer one to another seem in scope.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 19:30, Mike Samuel wrote:

2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
     using a fixed threshold for scientific notation.
     PROS: supports whole JSON value-space
     CONS: less useful for hashing
     CONS: risks loss of precision when decoders decide based on presence of
        decimal point whether to represent as double or int.


Have you actually looked into the specification?
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html#rfc.section.3.2.2
ES6 has all what it takes.

Anders



3. Preserve textual representation.
     PROS: avoids loss of precision
     PROS: can support whole JSON value-space
     CONS: not very useful for hashing

It seems that there is a tradeoff between usefulness for hashing and the 
ability to
support the whole JSON value-space.

Recommending this as a wire / storage format further complicates that tradeoff.

Regardless of which fork is chosen, there are some risks with the current 
design.
For example, 1e10 takes up some space in memory.  This might allow timing 
attacks.
Imagine an attacker can get Alice to embed 1e10 or another number in her 
JSON.
Alice sends that message to Bob over an encrypted channel.  Bob converts the 
JSON to
canonical JSON.  If Bob refuses some JSON payloads over a threshold size or the
time to process is noticably different for 1e10 vs 1e1 then the attacker can
tell, via traffic analysis alone, when Alice communicates with Bob.
We should avoid that in-memory blowup if possible.




   --scott

On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel > wrote:



On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren > wrote:

On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

    JSON.canonicalize(JSON.stringify("")) === `""` 
&&
JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`


I don't know what you are trying to prove here :-)


Only that canonical JSON is useful in a very narrow context.
It cannot be embedded in an HTML script tag.
It cannot be embedded in an XML or HTML foreign content context without 
extra care.
If it contains a string literal that embeds a NUL it cannot be embedded 
in XML period even if extra care is taken.


The output of JSON.canonicalize would also not be in the subset 
of JSON that is also a subset of JavaScript's PrimaryExpression.

     JSON.canonicalize(JSON.stringify("\u2028\u2029")) === 
`"\u2028\u2029"`

It also is not suitable for use internally within systems that 
internally use cstrings.

    JSON.canonicalize(JSON.stringify("\u")) === `"\u"`


JSON.canonicalize() would be [almost] identical to JSON.stringify()


You're correct.  Many JSON producers have a web-safe version, but the 
JavaScript builtin does not.
My point is that JSON.canonicalize undoes those web-safety tweaks.

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === 
'"\u2028\u2029"'  // Returns true

"Emulator":

var canonicalize = function(object) {

     var buffer = '';
     serialize(object);


I thought canonicalize took in a string of JSON and produced the same.  
Am I wrong?
"Canonicalize" to my mind means a function that returns the canonical 
member of an
equivalence class given any member from that same equivalence class, so is 
always 'a -> 'a.

     return buffer;

     function serialize(object) {
         if (object !== null && typeof object === 'object') {


JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you
can avoid this complexity.

             if (Array.isArray(object)) {
                 buffer += '[';
                 let next = false;
                 object.forEach((element) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true;
                     serialize(element);
                 });
                 buffer += ']';
             } else {
                 buffer += '{';
                 let next = false;
                 Object.keys(object).sort().forEach((property) => {
                     if (next) {
                         buffer += 

Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 1:54 PM, C. Scott Ananian 
wrote:

> And just to be clear: I'm all for standardizing a canonical JSON form.  In
> addition to my 11-year-old attempt, there have been countless others, and
> still no *standard*.  I just want us to learn from the previous attempts
> and try to make something at least as good as everything which has come
> before, especially in terms of the various non-obvious considerations which
> individual implementors have discovered the hard way over the years.
>

I think the hashing use case is an important one.  At the risk of
bikeshedding, "canonical" seems to overstate the usefulness.  Many assume
that the canonical form of something is usually the one you use in
preference to any other equivalent.

If the integer-only restriction is relaxed (see below), then
* The proposed canonical form seems useful as an input to strong hash
functions.
* It seems usable as a complete message body, but not preferable due to
potential loss of precision.
* It seems usable but not preferable as a long-term storage format.
* It seems a source of additional risk when used in conjunction with other
common web languages.

If that is correct, Would people be averse to marketing this as "hashable
JSON" instead of "canonical JSON?"

--

Numbers

There seem to be 3 main forks in the design space w.r.t. numbers.  I'm sure
cscott has thought of more, but to make it clear why I think canonical JSON
is not very useful as a wire/storage format.

1. Integers only
PROS: avoids floating point equality issues that have bedeviled many
systems
CONS: can support only a small portion of the JSON value space
CONS: small loss of precision risk with integers encoded from Decimal
values.
For example, won't roundtrip Java BigDecimals.
2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
using a fixed threshold for scientific notation.
PROS: supports whole JSON value-space
CONS: less useful for hashing
CONS: risks loss of precision when decoders decide based on presence of
   decimal point whether to represent as double or int.
3. Preserve textual representation.
PROS: avoids loss of precision
PROS: can support whole JSON value-space
CONS: not very useful for hashing

It seems that there is a tradeoff between usefulness for hashing and the
ability to
support the whole JSON value-space.

Recommending this as a wire / storage format further complicates that
tradeoff.

Regardless of which fork is chosen, there are some risks with the current
design.
For example, 1e10 takes up some space in memory.  This might allow
timing attacks.
Imagine an attacker can get Alice to embed 1e10 or another number in
her JSON.
Alice sends that message to Bob over an encrypted channel.  Bob converts
the JSON to
canonical JSON.  If Bob refuses some JSON payloads over a threshold size or
the
time to process is noticably different for 1e10 vs 1e1 then the
attacker can
tell, via traffic analysis alone, when Alice communicates with Bob.
We should avoid that in-memory blowup if possible.






>   --scott
>
> On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel  wrote:
>
>>
>>
>> On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
>> anders.rundgren@gmail.com> wrote:
>>
>>> On 2018-03-16 18:04, Mike Samuel wrote:
>>>
>>> It is entirely unsuitable to embedding in HTML or XML though.
 IIUC, with an implementation based on this

JSON.canonicalize(JSON.stringify("")) === `""` &&
 JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

>>>
>>> I don't know what you are trying to prove here :-)
>>>
>>
>> Only that canonical JSON is useful in a very narrow context.
>> It cannot be embedded in an HTML script tag.
>> It cannot be embedded in an XML or HTML foreign content context without
>> extra care.
>> If it contains a string literal that embeds a NUL it cannot be embedded
>> in XML period even if extra care is taken.
>>
>>
>>
>>>
>>> The output of JSON.canonicalize would also not be in the subset of JSON
 that is also a subset of JavaScript's PrimaryExpression.

 JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
 `"\u2028\u2029"`

 It also is not suitable for use internally within systems that
 internally use cstrings.

JSON.canonicalize(JSON.stringify("\u")) === `"\u"`


>>> JSON.canonicalize() would be [almost] identical to JSON.stringify()
>>>
>>
>> You're correct.  Many JSON producers have a web-safe version, but the
>> JavaScript builtin does not.
>> My point is that JSON.canonicalize undoes those web-safety tweaks.
>>
>>
>>
>>> JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'
>>> // Returns true
>>>
>>> "Emulator":
>>>
>>> var canonicalize = function(object) {
>>>
>>> var buffer = '';
>>> serialize(object);
>>>
>>
>> I thought canonicalize took in a string of JSON and produced the same.
>> 

Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 18:46, Mike Samuel wrote:



On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren > wrote:

On 2018-03-16 18:04, Mike Samuel wrote:

It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

    JSON.canonicalize(JSON.stringify("")) === `""` &&
JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`


I don't know what you are trying to prove here :-)


Only that canonical JSON is useful in a very narrow context.
It cannot be embedded in an HTML script tag.
It cannot be embedded in an XML or HTML foreign content context without extra 
care.
If it contains a string literal that embeds a NUL it cannot be embedded in XML 
period even if extra care is taken.


If we stick to browsers, JSON.canonicalize() would presumably be used with 
WebCrypto, WebSocket etc.

Node.js is probably a more important target.

Related stuff:
https://tools.ietf.org/id/draft-erdtman-jose-cleartext-jws-00.html
JSON signatures without canonicalization.




The output of JSON.canonicalize would also not be in the subset of JSON 
that is also a subset of JavaScript's PrimaryExpression.

     JSON.canonicalize(JSON.stringify("\u2028\u2029")) === 
`"\u2028\u2029"`

It also is not suitable for use internally within systems that 
internally use cstrings.

    JSON.canonicalize(JSON.stringify("\u")) === `"\u"`


JSON.canonicalize() would be [almost] identical to JSON.stringify()


You're correct.  Many JSON producers have a web-safe version, but the 
JavaScript builtin does not.
My point is that JSON.canonicalize undoes those web-safety tweaks.

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // 
Returns true

"Emulator":

var canonicalize = function(object) {

     var buffer = '';
     serialize(object);


I thought canonicalize took in a string of JSON and produced the same.  Am I 
wrong?


Yes, it is just a variant of JSON.stringify().


"Canonicalize" to my mind means a function that returns the canonical member of 
an
equivalence class given any member from that same equivalence class, so is always 
'a -> 'a.


This is rather a canonicalizing serializer.



     return buffer;

     function serialize(object) {
         if (object !== null && typeof object === 'object') {


JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you
can avoid this complexity.

             if (Array.isArray(object)) {
                 buffer += '[';
                 let next = false;
                 object.forEach((element) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true;
                     serialize(element);
                 });
                 buffer += ']';
             } else {
                 buffer += '{';
                 let next = false;
                 Object.keys(object).sort().forEach((property) => {
                     if (next) {
                         buffer += ',';
                     }
                     next = true; 


                     buffer += JSON.stringify(property);


I think you need a symbol check here.  JSON.stringify(Symbol.for('foo')) === 
undefined

                     buffer += ':';
                     serialize(object[property]);
                 });
                 buffer += '}';
             }
         } else {
             buffer += JSON.stringify(object);


This fails to distinguish non-integral numbers from integral ones, and produces 
non-standard output
when object === undefined.  Again, not a problem if the input is required to be 
valid JSON.


Well, a proper implementation would build on JSON.stringify() with property 
sorting as the only enhancement.



         }
     }
};




___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
And just to be clear: I'm all for standardizing a canonical JSON form.  In
addition to my 11-year-old attempt, there have been countless others, and
still no *standard*.  I just want us to learn from the previous attempts
and try to make something at least as good as everything which has come
before, especially in terms of the various non-obvious considerations which
individual implementors have discovered the hard way over the years.
  --scott

On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel  wrote:

>
>
> On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
> anders.rundgren@gmail.com> wrote:
>
>> On 2018-03-16 18:04, Mike Samuel wrote:
>>
>> It is entirely unsuitable to embedding in HTML or XML though.
>>> IIUC, with an implementation based on this
>>>
>>>JSON.canonicalize(JSON.stringify("")) === `""` &&
>>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>>
>>
>> I don't know what you are trying to prove here :-)
>>
>
> Only that canonical JSON is useful in a very narrow context.
> It cannot be embedded in an HTML script tag.
> It cannot be embedded in an XML or HTML foreign content context without
> extra care.
> If it contains a string literal that embeds a NUL it cannot be embedded in
> XML period even if extra care is taken.
>
>
>
>>
>> The output of JSON.canonicalize would also not be in the subset of JSON
>>> that is also a subset of JavaScript's PrimaryExpression.
>>>
>>> JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>>> `"\u2028\u2029"`
>>>
>>> It also is not suitable for use internally within systems that
>>> internally use cstrings.
>>>
>>>JSON.canonicalize(JSON.stringify("\u")) === `"\u"`
>>>
>>>
>> JSON.canonicalize() would be [almost] identical to JSON.stringify()
>>
>
> You're correct.  Many JSON producers have a web-safe version, but the
> JavaScript builtin does not.
> My point is that JSON.canonicalize undoes those web-safety tweaks.
>
>
>
>> JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  //
>> Returns true
>>
>> "Emulator":
>>
>> var canonicalize = function(object) {
>>
>> var buffer = '';
>> serialize(object);
>>
>
> I thought canonicalize took in a string of JSON and produced the same.  Am
> I wrong?
> "Canonicalize" to my mind means a function that returns the canonical
> member of an
> equivalence class given any member from that same equivalence class, so is
> always 'a -> 'a.
>
>
>> return buffer;
>>
>> function serialize(object) {
>> if (object !== null && typeof object === 'object') {
>>
>
> JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
> because Date.prototype.toJSON exists.
>
> If you operate as a JSON_string -> JSON_string function then you
> can avoid this complexity.
>
> if (Array.isArray(object)) {
>> buffer += '[';
>> let next = false;
>> object.forEach((element) => {
>> if (next) {
>> buffer += ',';
>> }
>> next = true;
>> serialize(element);
>> });
>> buffer += ']';
>> } else {
>> buffer += '{';
>> let next = false;
>> Object.keys(object).sort().forEach((property) => {
>> if (next) {
>> buffer += ',';
>> }
>> next = true;
>
> buffer += JSON.stringify(property);
>>
>
> I think you need a symbol check here.  JSON.stringify(Symbol.for('foo'))
> === undefined
>
>
>> buffer += ':';
>> serialize(object[property]);
>> });
>> buffer += '}';
>> }
>> } else {
>> buffer += JSON.stringify(object);
>>
>
> This fails to distinguish non-integral numbers from integral ones, and
> produces non-standard output
> when object === undefined.  Again, not a problem if the input is required
> to be valid JSON.
>
>
>> }
>> }
>> };
>>
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-16 18:04, Mike Samuel wrote:
>
> It is entirely unsuitable to embedding in HTML or XML though.
>> IIUC, with an implementation based on this
>>
>>JSON.canonicalize(JSON.stringify("")) === `""` &&
>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>
>
> I don't know what you are trying to prove here :-)


He wants to ship it as application/json and have it be safe if the browser
happens to ignore the mime type and interpret it as HTML or XML, I
believe.  Mandatory encoding of < as an escape would make the output "safe"
for such use.  I'm not convinced this is in-scope, but it's an interesting
case to consider when determining which characters ought to be escaped.

(I think he's writing `JSON.canonicalize(JSON.stringify(...))` where he
means to write `JSON.canonicalize(...)`, at least if I understand the
proposed API correctly.)


> The output of JSON.canonicalize would also not be in the subset of JSON
>> that is also a subset of JavaScript's PrimaryExpression.
>>
>> JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>> `"\u2028\u2029"`
>>
>
I'm not sure about this, but I think he's saying you can't just `eval` the
canonical JSON output, because newlines appear literally, not escaped. I
believe I actually ran into some compatibility issues with this back when I
was playing around with canonical JSON as well; certain JSON parsers
wouldn't accept "JSON" with embedded literal newlines.

OTOH, I don't think anyone should be encouraged to eval JSON!  As noted
previously, there should be a strict parse function to go along with the
strict serialize function.


> It also is not suitable for use internally within systems that internally
>> use cstrings.
>>
>>JSON.canonicalize(JSON.stringify("\u")) === `"\u"`
>>
>
A literal NUL character is unrepresentable in a naive C implementation.
You need to use pascal-style strings in your low-level implementation.
This is an important consideration for non-JavaScript use.  In my page I
noted, "Because only two byte values are escaped, be aware that
JSON-encoded data may contain embedded control characters and nulls."  A
similar warning is at least called for here.


> On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel 
> wrote:
> I also see
> """
> It is suggested that unicode strings be represented as the UTF-8 encoding
> of unicode Normalization Form C  (UAX
> #15). However, arbitrary content may be represented as a string: it is not
> guaranteed that string contents can be meaningfully parsed as UTF-8.
> """
> which seems to be mixing concerns about the wire format used to encode
> JSON as octets and NFC which would apply to the text of the JSON string.
>

Yes, it is rather unfortunate that we have only one datatype here and a bit
of an impedance mismatch.  JSON serialization is usually considered
literally as a byte-stream, but JavaScript wants to parse those bytes as
some encoding (usually UTF-8) of a UTF-16 string.

My suggestion is just to make this very plain in a SHOULD comment to the
potentially implementor.  If the underlying data is unicode string data, it
SHOULD be represented as the UTF-8 encoding of unicode Normalization Form C
(UAX #15).   However, the consumer should be aware that the data may be
binary bits and not interpretable as a valid UTF-8 string.

Re:

> Escape normalization: If you don't do this normalization, signatures would
> typically break and that's not really a "security" (=attacker) problem; it
> is rather a "nuisance" of the same caliber as a server not responding.


Consider signatures for malware detection.  If an attacker can trivially
modify their (in this example) JSON-encoded payload so that it is still
"canonical" and still passes whatever input verifier exists (so much easier
if there is not strict parsing!), then they can bypass your signature-based
detection system.  That's a security problem.

Both sides must be true: equal hashes should mean equal content (to high
probability) and unequal hashes should mean different content.  Otherwise
there is a security problem.
 --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-16 18:04, Mike Samuel wrote:
>
> It is entirely unsuitable to embedding in HTML or XML though.
>> IIUC, with an implementation based on this
>>
>>JSON.canonicalize(JSON.stringify("")) === `""` &&
>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>
>
> I don't know what you are trying to prove here :-)
>

Only that canonical JSON is useful in a very narrow context.
It cannot be embedded in an HTML script tag.
It cannot be embedded in an XML or HTML foreign content context without
extra care.
If it contains a string literal that embeds a NUL it cannot be embedded in
XML period even if extra care is taken.



>
> The output of JSON.canonicalize would also not be in the subset of JSON
>> that is also a subset of JavaScript's PrimaryExpression.
>>
>> JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>> `"\u2028\u2029"`
>>
>> It also is not suitable for use internally within systems that internally
>> use cstrings.
>>
>>JSON.canonicalize(JSON.stringify("\u")) === `"\u"`
>>
>>
> JSON.canonicalize() would be [almost] identical to JSON.stringify()
>

You're correct.  Many JSON producers have a web-safe version, but the
JavaScript builtin does not.
My point is that JSON.canonicalize undoes those web-safety tweaks.



> JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  //
> Returns true
>
> "Emulator":
>
> var canonicalize = function(object) {
>
> var buffer = '';
> serialize(object);
>

I thought canonicalize took in a string of JSON and produced the same.  Am
I wrong?
"Canonicalize" to my mind means a function that returns the canonical
member of an
equivalence class given any member from that same equivalence class, so is
always 'a -> 'a.


> return buffer;
>
> function serialize(object) {
> if (object !== null && typeof object === 'object') {
>

JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
because Date.prototype.toJSON exists.

If you operate as a JSON_string -> JSON_string function then you
can avoid this complexity.

if (Array.isArray(object)) {
> buffer += '[';
> let next = false;
> object.forEach((element) => {
> if (next) {
> buffer += ',';
> }
> next = true;
> serialize(element);
> });
> buffer += ']';
> } else {
> buffer += '{';
> let next = false;
> Object.keys(object).sort().forEach((property) => {
> if (next) {
> buffer += ',';
> }
> next = true;

buffer += JSON.stringify(property);
>

I think you need a symbol check here.  JSON.stringify(Symbol.for('foo'))
=== undefined


> buffer += ':';
> serialize(object[property]);
> });
> buffer += '}';
> }
> } else {
> buffer += JSON.stringify(object);
>

This fails to distinguish non-integral numbers from integral ones, and
produces non-standard output
when object === undefined.  Again, not a problem if the input is required
to be valid JSON.


> }
> }
> };
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 18:04, Mike Samuel wrote:


It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

   JSON.canonicalize(JSON.stringify("")) === `""` &&
JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`


I don't know what you are trying to prove here :-)



The output of JSON.canonicalize would also not be in the subset of JSON that is 
also a subset of JavaScript's PrimaryExpression.

    JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally use 
cstrings.

   JSON.canonicalize(JSON.stringify("\u")) === `"\u"`



JSON.canonicalize() would be [almost] identical to JSON.stringify()

JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'  // 
Returns true

"Emulator":

var canonicalize = function(object) {

var buffer = '';
serialize(object);
return buffer;

function serialize(object) {
if (object !== null && typeof object === 'object') {
if (Array.isArray(object)) {
buffer += '[';
let next = false;
object.forEach((element) => {
if (next) {
buffer += ',';
}
next = true;
serialize(element);
});
buffer += ']';
} else {
buffer += '{';
let next = false;
Object.keys(object).sort().forEach((property) => {
if (next) {
buffer += ',';
}
next = true;
buffer += JSON.stringify(property);
buffer += ':';
serialize(object[property]);
});
buffer += '}';
}
} else {
buffer += JSON.stringify(object);
}
}
};
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 12:44 PM, C. Scott Ananian 
wrote:

> On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel 
> wrote:
>>
>>
>> On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian > > wrote:
>>
>>> Canonical JSON is often used to imply a security property: two JSON
>>> blobs with identical contents are expected to have identical canonical JSON
>>> forms (and thus identical hashed values).
>>>
>>
>> What does "identical contents" mean in the context of numbers?  JSON
>> intentionally avoids specifying any precision for numbers.
>>
>> JSON.stringify(1/3) === '0.'
>>
>> What would happen with JSON from systems that allow higher precision?
>> I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?
>>
>> However, unicode normalization allows multiple representations of "the
>>> same" string, which defeats this security property.  Depending on your
>>> implementation language
>>>
>>
>> We shouldn't normalize unicode in strings that contain packed binary
>> data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar
>> values and any system that assumes the latter will break often.
>>
>
> Both of these points are made on the URL I originally cited:
> http://wiki.laptop.org/go/Canonical_JSON
>

Thanks, I see
"""
Floating point numbers are not allowed in canonical JSON. Neither are
leading zeros or "minus 0" for integers.
"""
which answers my question.

I also see
"""
A previous version of this specification required strings to be valid
unicode, and relied on JSON's \u escape. This was abandoned as it doesn't
allow representing arbitrary binary data in a string, and it doesn't
preserve the identity of non-canonical unicode strings.
"""
which addresses my question.

I also see
"""
It is suggested that unicode strings be represented as the UTF-8 encoding
of unicode Normalization Form C  (UAX
#15). However, arbitrary content may be represented as a string: it is not
guaranteed that string contents can be meaningfully parsed as UTF-8.
"""
which seems to be mixing concerns about the wire format used to encode JSON
as octets and NFC which would apply to the text of the JSON string.


If that confusion is cleaned up, then it seems a fine subset of JSON to
ship over the wire with a JSON content-type.


It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

  JSON.canonicalize(JSON.stringify("")) === `""` &&
  JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

The output of JSON.canonicalize would also not be in the subset of JSON
that is also a subset of JavaScript's PrimaryExpression.

   JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally
use cstrings.

  JSON.canonicalize(JSON.stringify("\u")) === `"\u"`
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 16:38, C. Scott Ananian wrote:

Canonical JSON is often used to imply a security property: two JSON blobs > 
with identical contents are expected to have identical canonical JSON
forms (and thus identical hashed values).


Right.


However, unicode normalization allows multiple representations
of "the same" string, which defeats this security property.


This is an aspect that I believe belongs to the "application" level.  This specification 
is only about "on the wire" format.

Rationale: if this was a part of the SPECIFICATION it would either be ignored 
(=useless) or be a showstopper (=dead) due to complexity.

If applications using the received data want to address this issue they can for 
example call
https://msdn.microsoft.com/en-us/library/windows/desktop/dd318671(v=vs.85).aspx
and reject if they want.

Or always normalize: 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize



Depending on your implementation language and use, a string with
precomposed accepts could compare equal to a string with separated 
accents, even though the canonical JSON or hash differed.  


I don't want to go there for the reasons mentioned.



In an extreme case (with a weak hash function, say MD5), this can be  > used to 
break security by re-encoding all strings in multiple variants
until a collision is found.  This is just a slight variant on the fact
that JSON allows multiple ways to encode a character using escape sequences.
You've already taken the trouble to disambiguate this case; security-conscious
applications should take care to perform unicode normalization as well, for the 
same reason.


If you are able to break the hash function all bets are off anyway because then 
you can presumably change *any* part of the object and it would still appear 
authentic.

Escape normalization: If you don't do this normalization, signatures would typically break and 
that's not really a "security" (=attacker) problem; it is rather a "nuisance" 
of the same caliber as a server not responding.



Similarly, if you don't offer a verifier to ensure that the input is
in "canonical JSON" format, then an attacker can try to create collisions 
by violating the rules of canonical JSON format, whether by using different

escape sequences, adding whitespace, etc.  This can be used to make JSON which
is "the same" appear "different", violating the intent of the canonicalization.


Again, if the hash function is broken, there's nothing to do except maybe cry 
:-(

This a Unicode problem, not a cryptographic problem.


Any security application of canonical JSON will require a strict mode for 
JSON.parse() as well as a strict mode for JSON.stringify().


Indeed, you ALWAYS must verify that indata conforms to the agreed conventions.

Anyway, feel free pushing a different JSON canonicalization scheme!

Here is another: http://gibson042.github.io/canonicaljson-spec/
It claims that you should support "lone surrogates" (invalid Unicode) which for 
example JDK doesn't.
I don't go there either.

Anders


   --scott

On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren > wrote:

On 2018-03-16 08:52, C. Scott Ananian wrote:

See http://wiki.laptop.org/go/Canonical_JSON 
 -- you should probably at least
mention unicode normalization of strings.


Yes, I could add that unicode normalization of strings is out of scope for 
this specification.


You probably should also specify a validator: it doesn't matter if you 
emit canonical JSON if you can tweak the hash of the value by feeding 
non-canonical JSON as an input.


Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx,
Anders


    --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren  >> wrote:

     Dear List,

     Here is a proposal that I would be very happy getting feedback on 
since it builds on ES but is not (at all) limited to ES.

     The request is for a complement to the ES "JSON" object called 
canonicalize() which would have identical parameters to the existing stringify() method.

     The JSON canonicalization scheme (including ES code for emulating 
it), is described in:

https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html 

 >

     Current workspace: 

Re: JSON.canonicalize()

2018-03-16 Thread Carsten Bormann
On Mar 16, 2018, at 16:23, Mike Samuel  wrote:
> 
> JSON strings are strings of UTF-16 code-units

No.

(You are confusing this with JavaScript strings.)

Grüße, Carsten

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel  wrote:
>
>
> On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian 
> wrote:
>
>> Canonical JSON is often used to imply a security property: two JSON blobs
>> with identical contents are expected to have identical canonical JSON forms
>> (and thus identical hashed values).
>>
>
> What does "identical contents" mean in the context of numbers?  JSON
> intentionally avoids specifying any precision for numbers.
>
> JSON.stringify(1/3) === '0.'
>
> What would happen with JSON from systems that allow higher precision?
> I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?
>
> However, unicode normalization allows multiple representations of "the
>> same" string, which defeats this security property.  Depending on your
>> implementation language
>>
>
> We shouldn't normalize unicode in strings that contain packed binary
> data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar
> values and any system that assumes the latter will break often.
>

Both of these points are made on the URL I originally cited:
http://wiki.laptop.org/go/Canonical_JSON
 --scott
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mike Samuel
On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian 
wrote:

> Canonical JSON is often used to imply a security property: two JSON blobs
> with identical contents are expected to have identical canonical JSON forms
> (and thus identical hashed values).
>

What does "identical contents" mean in the context of numbers?  JSON
intentionally avoids specifying any precision for numbers.

JSON.stringify(1/3) === '0.'

What would happen with JSON from systems that allow higher precision?
I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?





> However, unicode normalization allows multiple representations of "the
> same" string, which defeats this security property.  Depending on your
> implementation language
>

We shouldn't normalize unicode in strings that contain packed binary data.
JSON strings are strings of UTF-16 code-units, not Unicode scalar values
and any system that assumes the latter will break often.


> and use, a string with precomposed accepts could compare equal to a string
> with separated accents, even though the canonical JSON or hash differed.
> In an extreme case (with a weak hash function, say MD5), this can be used
> to break security by re-encoding all strings in multiple variants until a
> collision is found.  This is just a slight variant on the fact that JSON
> allows multiple ways to encode a character using escape sequences.  You've
> already taken the trouble to disambiguate this case; security-conscious
> applications should take care to perform unicode normalization as well, for
> the same reason.
>
> Similarly, if you don't offer a verifier to ensure that the input is in
> "canonical JSON" format, then an attacker can try to create collisions by
> violating the rules of canonical JSON format, whether by using different
> escape sequences, adding whitespace, etc.  This can be used to make JSON
> which is "the same" appear "different", violating the intent of the
> canonicalization.  Any security application of canonical JSON will require
> a strict mode for JSON.parse() as well as a strict mode for
> JSON.stringify().
>

Given the dodginess of "identical" w.r.t. non-integral numbers, shouldn't
endpoints be re-canonicalizing before hashing anyway?  Why would one want
to ship the canonical form over the wire if it loses precision?



>   --scott
>
> On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <
> anders.rundgren@gmail.com> wrote:
>
>> On 2018-03-16 08:52, C. Scott Ananian wrote:
>>
>>> See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
>>> least
>>> mention unicode normalization of strings.
>>>
>>
>> Yes, I could add that unicode normalization of strings is out of scope
>> for this specification.
>>
>>
>> You probably should also specify a validator: it doesn't matter if you
>>> emit canonical JSON if you can tweak the hash of the value by feeding
>>> non-canonical JSON as an input.
>>>
>>
>> Pardon me, but I don't understand what you are writing here.
>>
>> Hash functions only "raison d'être" are providing collision safe
>> checksums.
>>
>> thanx,
>> Anders
>>
>>
>>--scott
>>>
>>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>>> anders.rundgren@gmail.com >
>>> wrote:
>>>
>>> Dear List,
>>>
>>> Here is a proposal that I would be very happy getting feedback on
>>> since it builds on ES but is not (at all) limited to ES.
>>>
>>> The request is for a complement to the ES "JSON" object called
>>> canonicalize() which would have identical parameters to the existing
>>> stringify() method.
>>>
>>
Why should canonicalize take a replacer?  Hasn't replacement already
happened?



> The JSON canonicalization scheme (including ES code for emulating it),
>>> is described in:
>>> https://cyberphone.github.io/doc/security/draft-rundgren-jso
>>> n-canonicalization-scheme.html >> doc/security/draft-rundgren-json-canonicalization-scheme.html>
>>>
>>> Current workspace: https://github.com/cyberphone/
>>> json-canonicalization >> /json-canonicalization>
>>>
>>> Thanx,
>>> Anders Rundgren
>>> ___
>>> es-discuss mailing list
>>> es-discuss@mozilla.org 
>>> https://mail.mozilla.org/listinfo/es-discuss <
>>> https://mail.mozilla.org/listinfo/es-discuss>
>>>
>>>
>>>
>>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
Canonical JSON is often used to imply a security property: two JSON blobs
with identical contents are expected to have identical canonical JSON forms
(and thus identical hashed values).

However, unicode normalization allows multiple representations of "the
same" string, which defeats this security property.  Depending on your
implementation language and use, a string with precomposed accepts could
compare equal to a string with separated accents, even though the canonical
JSON or hash differed.  In an extreme case (with a weak hash function, say
MD5), this can be used to break security by re-encoding all strings in
multiple variants until a collision is found.  This is just a slight
variant on the fact that JSON allows multiple ways to encode a character
using escape sequences.  You've already taken the trouble to disambiguate
this case; security-conscious applications should take care to perform
unicode normalization as well, for the same reason.

Similarly, if you don't offer a verifier to ensure that the input is in
"canonical JSON" format, then an attacker can try to create collisions by
violating the rules of canonical JSON format, whether by using different
escape sequences, adding whitespace, etc.  This can be used to make JSON
which is "the same" appear "different", violating the intent of the
canonicalization.  Any security application of canonical JSON will require
a strict mode for JSON.parse() as well as a strict mode for
JSON.stringify().
  --scott

On Fri, Mar 16, 2018 at 4:48 AM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> On 2018-03-16 08:52, C. Scott Ananian wrote:
>
>> See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
>> least
>> mention unicode normalization of strings.
>>
>
> Yes, I could add that unicode normalization of strings is out of scope for
> this specification.
>
>
> You probably should also specify a validator: it doesn't matter if you
>> emit canonical JSON if you can tweak the hash of the value by feeding
>> non-canonical JSON as an input.
>>
>
> Pardon me, but I don't understand what you are writing here.
>
> Hash functions only "raison d'être" are providing collision safe checksums.
>
> thanx,
> Anders
>
>
>--scott
>>
>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>> anders.rundgren@gmail.com >
>> wrote:
>>
>> Dear List,
>>
>> Here is a proposal that I would be very happy getting feedback on
>> since it builds on ES but is not (at all) limited to ES.
>>
>> The request is for a complement to the ES "JSON" object called
>> canonicalize() which would have identical parameters to the existing
>> stringify() method.
>>
>> The JSON canonicalization scheme (including ES code for emulating
>> it), is described in:
>> https://cyberphone.github.io/doc/security/draft-rundgren-jso
>> n-canonicalization-scheme.html > doc/security/draft-rundgren-json-canonicalization-scheme.html>
>>
>> Current workspace: https://github.com/cyberphone/
>> json-canonicalization > /json-canonicalization>
>>
>> Thanx,
>> Anders Rundgren
>> ___
>> es-discuss mailing list
>> es-discuss@mozilla.org 
>> https://mail.mozilla.org/listinfo/es-discuss <
>> https://mail.mozilla.org/listinfo/es-discuss>
>>
>>
>>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Anders Rundgren

On 2018-03-16 08:52, C. Scott Ananian wrote:

See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at least
mention unicode normalization of strings. 


Yes, I could add that unicode normalization of strings is out of scope for this 
specification.


You probably should also specify a validator: it doesn't matter if you emit 
canonical JSON if you can tweak the hash of the value by feeding non-canonical 
JSON as an input.


Pardon me, but I don't understand what you are writing here.

Hash functions only "raison d'être" are providing collision safe checksums.

thanx,
Anders



   --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren > wrote:

Dear List,

Here is a proposal that I would be very happy getting feedback on since it 
builds on ES but is not (at all) limited to ES.

The request is for a complement to the ES "JSON" object called 
canonicalize() which would have identical parameters to the existing stringify() method.

The JSON canonicalization scheme (including ES code for emulating it), is 
described in:

https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
 


Current workspace: https://github.com/cyberphone/json-canonicalization 


Thanx,
Anders Rundgren
___
es-discuss mailing list
es-discuss@mozilla.org 
https://mail.mozilla.org/listinfo/es-discuss 





___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread C. Scott Ananian
See http://wiki.laptop.org/go/Canonical_JSON -- you should probably at
least mention unicode normalization of strings.  You probably should also
specify a validator: it doesn't matter if you emit canonical JSON if you
can tweak the hash of the value by feeding non-canonical JSON as an input.
  --scott

On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
anders.rundgren@gmail.com> wrote:

> Dear List,
>
> Here is a proposal that I would be very happy getting feedback on since it
> builds on ES but is not (at all) limited to ES.
>
> The request is for a complement to the ES "JSON" object called
> canonicalize() which would have identical parameters to the existing
> stringify() method.
>
> The JSON canonicalization scheme (including ES code for emulating it), is
> described in:
> https://cyberphone.github.io/doc/security/draft-rundgren-jso
> n-canonicalization-scheme.html
>
> Current workspace: https://github.com/cyberphone/json-canonicalization
>
> Thanx,
> Anders Rundgren
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss