The hash is calculated over the "normalized" JSON output, where "normalized" 
basically means stripped of all whitespaces by the "generator". This is as 
canonical as it gets. Then the same data are transmitted in "loose" form, i.e. 
with some indentation so it is humanly readable. The other party has two 
options how to verify the hash.

1) Take the file as the text file, remove all the whitespaces, why doing some 
hardcoded primitive "JSON parsing" probably very limited and very error-prone 
and recalculate the hash from that. Since it will only use the data already 
available in the original text input, it could not anyhow corrupt them or 
change them, it just needs to know how to remove all white spaces correctly.

2) Use JSON decoder to decode it (hopefully without losing anything in the 
process) and then dump it into "normalized" form and compute the hash over this 
one. This has the risk of conversion error, but if I could avoid that risk by 
using a custom type which does not have such an error, it would be much easier 
and maintenable solution.

Recoding the data into some other format (binary or textual) for the hash would 
just add another level of complexity and will face the exactly same issues. 
Plus the goal of the hash is to protect the information in its transmitted form 
(i.e. in its textual form) because this is the only one which is available to 
both the sender and receiver, and not to authenticate some other representation 
of the same data which may be subject to "rounding errors" depending on the 
situation.

But as I said, discussing this was not the point of the OP.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IYU6MGEHPLGK4HYZNITMKU3HN2V5VVFK/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to