[issue29992] Expose parse_string in JSONDecoder

2022-02-16 Thread Inada Naoki


Inada Naoki  added the comment:

> Generally speaking, parsing some things as decimal or datetime are schema 
> dependent.

Totally agree with this.

> In order to provide maximal flexibility it would be much nicer to have a 
> streaming interface available (like SAX for XML parsing), but that is not 
> what this is.

I think it is too difficult and complicated.
I think post-processing approach (e.g. dataclass_json, pydantic) is enough.

--
nosy: +methane

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2022-02-16 Thread Tobias Oberstein


Tobias Oberstein  added the comment:

> It's unlikely that you would want to parse every string that looks enough 
> like a decimal as a decimal, or that you would want to pay the cost of 
> checking every string in the whole document to see if it's a decimal.

fwiw, yes, that's what I do, and yes, it needs to check every string

https://github.com/crossbario/autobahn-python/blob/bc98e4ea5a2a81e41209ea22d9acc53258fb96be/autobahn/wamp/serializer.py#L410

> Returning a decimal as a string is becoming quite common in REST APIs to 
> ensure there is no floating point errors.

exactly. it is simply required if money values are involved.

since JSON doesn't have a native Decimal, strings need to be used (the only 
scalar type in JSON that allows one to encode the needed arbitrary precision 
decimals)

CBOR has tagged decimal fraction encoding, as described in RFC7049 section 
2.4.3.

fwiw, we've added roundtrip and crosstrip testing between CBOR <=> JSON in our 
hacked Python JSON, and it works

https://github.com/crossbario/autobahn-python/blob/bc98e4ea5a2a81e41209ea22d9acc53258fb96be/autobahn/wamp/test/test_wamp_serializer.py#L235

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2018-01-11 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I concur with Bob.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2018-01-11 Thread Bob Ippolito

Bob Ippolito  added the comment:

Generally speaking, parsing some things as decimal or datetime are schema 
dependent. It's unlikely that you would want to parse every string that looks 
enough like a decimal as a decimal, or that you would want to pay the cost of 
checking every string in the whole document to see if it's a decimal. This use 
case is probably better served using something like object_pairs_hook where you 
have some context available.

Ultimate flexibility is not the goal of this interface. It's grown a bit too 
much of that over time. At this point I'm a lot more interested in proposals 
that remove options rather than add them.

In order to provide maximal flexibility it would be much nicer to have a 
streaming interface available (like SAX for XML parsing), but that is not what 
this is.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2018-01-10 Thread Adrián Orive

Adrián Orive  added the comment:

Third file

--
Added file: https://bugs.python.org/file47376/__init__.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2018-01-10 Thread Adrián Orive

Adrián Orive  added the comment:

Second file

--
Added file: https://bugs.python.org/file47375/decoder.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2018-01-10 Thread Adrián Orive

Adrián Orive  added the comment:

I found the same problem. My case seems to be less exotic, as what I'm trying 
to do is parse some of these strings into decimal.Decimal or datetime.datetime 
formats. Returning a decimal as a string is becoming quite common in REST APIs 
to ensure there is no floating point errors.

This is not a simple "a parameter is lacking problem":

1) JSONDecoder has 6 parse_XXX attributes (parse_int, parse_float, 
parse_constant, parse_string, parse_object, parse_array) and only first 3 of 
those are offered as parameters. The three last ones fall into a different 
category as they are not actually parsers but part of the scanner logic, but 
the first 3 are simple JSON types so, why keep only 3 parsers plus the 2 
additional object hooks instead of providing a full set of parsers (arrays, 
strings, keys)?

2) JSONDecoder.__init__ method calls json.scanner.make_scanner function, so 
even when subclassing JSONDecoder and modifying some attributes after calling 
super().__init__ it will not work, the scanner needs to be reseted.

3) make_scanner is implementented in both C (c_make_scanner) and Python 
(py_make_scanner), the later is used as backup in case the former could not be 
imported. The C and Python versions behaviour IS NOT CONSISTENT.
  - c_make_scanner IGNORES JSONDecoder's parse_string attribute. This also 
applies to parse_array and parse_object attributes.
  - py_make_scanner ONLY uses it for JSON object values, keys have 
json.decoder.scanstring hardcoded.

4) ONLY make_scanner IS BEING "EXPORTED" (__all__ = ['make_scanner']) so 
knowing the existence of the two versions requires getting deep into json's 
code. This also applies to json.decoder's scanstring, JSONObject and JSONArray.


The second point would be solved by providing all the needed params, as that 
would mean that you don't need to modify the attribute after calling 
JSONDecoder.__init__. This makes more sense than mnoving the make_scanner call 
out of the __init__ method as it is clearly part of the initialization. Has to 
be noted, however, that moving the make_scanner call from the __init__ to the 
raw_decode methods, despite making less sense, would only be a performance 
degradation for the default JSONDecover as the rest are only used once.

The forth point would be solved if both the first and the third point are 
solved, as these methods (c_make_scanner, py_make_scanner, scanstring, 
JSONObject and JSONArray) would be implementation details and would not be 
needed by the user, so not exporting them would be the right choice.

So my proposal focuses on fixing the first and third point, keeping in mind 
that it needs to be backwards compatible:

The process of decoding a JSON string into a Python object can be conceptually 
divided into two steps, interpretting the characters and then transforming it 
into the corresponding Python object. The first step is what the scanner is 
doing with the character matching, the number regex, scanstring, JSONObject and 
JSONArray. The second step is what parse_int, parse_float, parse_constant, 
object_hook and object_pairs_hook attributes are for. Dividing this two steps 
its important as the first one is an implementation detail so it can stay 
hardcoded (keeping the consistency of both C and Python versions), while the 
second one is the one where the user is given some hooks to slightly modify its 
behaviour.

Adding additional hooks for arrays, strings and objects' keys will give the 
users every customization tool available. This change plus refactoring the 
first steps to use names that do not get confused with these hooks or parsers 
will solve all the points described above.

The following files represent an operational version of the json module with 
these changes applies. encoder.py and tool.py have not been modified.

It has to be taken into account that some C aceletations have been disabled as 
the C _json module hasn't been modified and thus differ in either operation or 
method signature with the new version. If these changes seem to get the 
communities aproval and are thus gonna be applied to the standard library, in 
addition to the C _json module modifications to adapt to this new version, 
lines 123 and 311, marked with '# SWAP:' need to be also modified in order to 
use the C acelerations.

--
nosy: +Adrián Orive
Added file: https://bugs.python.org/file47374/scanner.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-05-02 Thread Bob Ippolito

Bob Ippolito added the comment:

That's not a very convincing argument. Python 2 only returns byte strings if 
the input is a byte string and the contents of the string are all ASCII. 
Facilitating that sort of behavior in 3 would probably cause more issues than 
it solves.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-05-01 Thread Levi Cameron

Levi Cameron added the comment:

A less exotic use case than oberstet's: Converting code from python 2 -> 3 and 
trying to maintain python 2 behaviour of returning all strings as bytes rather 
than unicode strings.

obsertet's solution works but is very much tied to the current implementation.

--
nosy: +Levi Cameron

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-04-05 Thread Bob Ippolito

Bob Ippolito added the comment:

I agree with that sentiment. If we were to want to support this use case I 
would rather put together a coherent way to augment the parsing/encoding of 
anything than bolt it on to what we have.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-04-05 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
components: +Library (Lib)
versions: +Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-04-05 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I agree with Serhiy that the JSON module is already complex enough that adding 
more features will have a negative net effect on usability.

Bob, what do you think?

--
assignee:  -> bob.ippolito
nosy: +bob.ippolito

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-04-05 Thread Tobias Oberstein

Tobias Oberstein added the comment:

I agree, my use case is probably exotic: transparent roundtripping of binaries 
over JSON using a beginning \0 byte marker to distinguish plain string and 
base64 encoded binaries.

FWIW, I do think however that adding "parse_string" kw param to the ctor of 
JSONDecoder would at least fit the current approach: there are parse_xxx 
parameters for all the other things already.

If overriding string parsing would be via subclassing, while all the others 
stay with the kw parameter approach, that could be slightly confusing too, 
because it looses on consistency.

Switching everything to subclassing/overriding for _all_ parse_XXX is I guess a 
no go, because it breaks existing stuff?

> For me in my situation, it'll be messy anyways, because I need to support Py2 
> and 3, and CPy and PyPy .. I just filed the issue for "completeness".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-04-05 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

JSONDecoder constructor already has too much parameters. Adding new parameters 
will decrease usability. For such uncommon case I think overriding a method in 
a subclass is the best solution.

--
nosy: +ezio.melotti, rhettinger, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29992] Expose parse_string in JSONDecoder

2017-04-05 Thread Tobias Oberstein

New submission from Tobias Oberstein:

Though the JSONDecoder already has all the hooks internally to allow for a 
custom parse_string 
(https://github.com/python/cpython/blob/master/Lib/json/decoder.py#L330), this 
currently is not exposed in the constructor JSONDecoder.__init__.

It would be nice to expose it. Currently, I need to do hack it: 
https://gist.github.com/oberstet/fa8b8e04b8d532912bd616d9db65101a

--
messages: 291167
nosy: oberstet
priority: normal
severity: normal
status: open
title: Expose parse_string in JSONDecoder
type: enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com