Eric Appelt added the comment:

Thanks for the feedback. I agree that the comment is incorrect for several 
iterations of the loop that really don't need to be tested at all for '5'. I 
read the previous issue 17909 more carefully along with RFC 4627, 7159, and 
EMCA 404 to properly understand the context here.

I notice that although the issue 17909 patch was written in order to implement 
the patterns in RFC 4627, it is also valid for the additional pattern 
introduced for utf-16-le in RFC 7159 as described in [1].

Specifically, in RFC 4627 the only valid pattern for utf-16-le begins with XX 
00 XX 00 since the first two characters must represent ASCII codepoints. With 
strings the second character can be higher codepoints allowing for XX 00 XX XX 
or XX 00 00 XX. In the implementation from issue 17909 the pattern XX 00 XX is 
first detected and results in utf-16-le, and then the 4th byte is checked for 
the pattern XX 00 00 XX or XX 00 00 00 to result in utf-16-le or utf-32 
respectively.

In the issue 17909 patch the special case of a single character JSON document 
(i.e. '5') is also specifically accounted for. This case is not mentioned in 
[1].

So for everything I can try (or think of), this implementation can correctly 
determine the encoding of any valid JSON document according to RFC 7159. This 
is good since the documentation [2] makes no distinction that json.loads() 
would only accept JSON documents in bytes if they adhere to 4627.

The encoding failure mode described in issue 17909 is still an invalid JSON 
document according to RFC 7159 as the control character \u0000 is not escaped.

So I think overall the implementation is perfectly valid for RFC 7159 / EMCA 
404 as suggested by the standard library documentation [2].

To me it seems reasonable to have a unit test to specifically check that the 
pattern XX 00 00 XX works. For the goal of hitting 100% test coverage as 
measured by line, I believe that the 2-byte case still needs to be tested.

I have attached a patch that covers these cases along with (maybe too) verbose 
comments explaining the edge cases. I also changed the comments in the 
json/__init__.py module itself to clarify the detection patterns as implemented.

I'm not sure how helpful this is to the improvement in of the standard library, 
but it has been very educational to be so far.

[1] http://www.ietf.org/mail-archive/web/json/current/msg01959.html
[2] https://docs.python.org/3.7/library/json.html

----------
Added file: http://bugs.python.org/file45246/mywork3.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28541>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to