Adrián Orive <aor...@ikerlan.es> added the comment:

I found the same problem. My case seems to be less exotic, as what I'm trying 
to do is parse some of these strings into decimal.Decimal or datetime.datetime 
formats. Returning a decimal as a string is becoming quite common in REST APIs 
to ensure there is no floating point errors.

This is not a simple "a parameter is lacking problem":

1) JSONDecoder has 6 parse_XXX attributes (parse_int, parse_float, 
parse_constant, parse_string, parse_object, parse_array) and only first 3 of 
those are offered as parameters. The three last ones fall into a different 
category as they are not actually parsers but part of the scanner logic, but 
the first 3 are simple JSON types so, why keep only 3 parsers plus the 2 
additional object hooks instead of providing a full set of parsers (arrays, 
strings, keys)?

2) JSONDecoder.__init__ method calls json.scanner.make_scanner function, so 
even when subclassing JSONDecoder and modifying some attributes after calling 
super().__init__ it will not work, the scanner needs to be reseted.

3) make_scanner is implementented in both C (c_make_scanner) and Python 
(py_make_scanner), the later is used as backup in case the former could not be 
imported. The C and Python versions behaviour IS NOT CONSISTENT.
  - c_make_scanner IGNORES JSONDecoder's parse_string attribute. This also 
applies to parse_array and parse_object attributes.
  - py_make_scanner ONLY uses it for JSON object values, keys have 
json.decoder.scanstring hardcoded.

4) ONLY make_scanner IS BEING "EXPORTED" (__all__ = ['make_scanner']) so 
knowing the existence of the two versions requires getting deep into json's 
code. This also applies to json.decoder's scanstring, JSONObject and JSONArray.


The second point would be solved by providing all the needed params, as that 
would mean that you don't need to modify the attribute after calling 
JSONDecoder.__init__. This makes more sense than mnoving the make_scanner call 
out of the __init__ method as it is clearly part of the initialization. Has to 
be noted, however, that moving the make_scanner call from the __init__ to the 
raw_decode methods, despite making less sense, would only be a performance 
degradation for the default JSONDecover as the rest are only used once.

The forth point would be solved if both the first and the third point are 
solved, as these methods (c_make_scanner, py_make_scanner, scanstring, 
JSONObject and JSONArray) would be implementation details and would not be 
needed by the user, so not exporting them would be the right choice.

So my proposal focuses on fixing the first and third point, keeping in mind 
that it needs to be backwards compatible:

The process of decoding a JSON string into a Python object can be conceptually 
divided into two steps, interpretting the characters and then transforming it 
into the corresponding Python object. The first step is what the scanner is 
doing with the character matching, the number regex, scanstring, JSONObject and 
JSONArray. The second step is what parse_int, parse_float, parse_constant, 
object_hook and object_pairs_hook attributes are for. Dividing this two steps 
its important as the first one is an implementation detail so it can stay 
hardcoded (keeping the consistency of both C and Python versions), while the 
second one is the one where the user is given some hooks to slightly modify its 
behaviour.

Adding additional hooks for arrays, strings and objects' keys will give the 
users every customization tool available. This change plus refactoring the 
first steps to use names that do not get confused with these hooks or parsers 
will solve all the points described above.

The following files represent an operational version of the json module with 
these changes applies. encoder.py and tool.py have not been modified.

It has to be taken into account that some C aceletations have been disabled as 
the C _json module hasn't been modified and thus differ in either operation or 
method signature with the new version. If these changes seem to get the 
communities aproval and are thus gonna be applied to the standard library, in 
addition to the C _json module modifications to adapt to this new version, 
lines 123 and 311, marked with '# SWAP:' need to be also modified in order to 
use the C acelerations.

----------
nosy: +Adrián Orive
Added file: https://bugs.python.org/file47374/scanner.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue29992>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to