On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.tur...@gmail.com> wrote:
> json.load and json.dump already default to UTF8 and already have > parameters for json loading and dumping. > yes, of course. json.loads and json.dumps exist only because there was no way to > distinguish between a string containing JSON and a file path string. > (They probably should've been .loadstr and .dumpstr, but it's too late for > that now) > I think they exist because that was the pickle API from years ago -- though maybe that's why the pickle API had them. Though I think you have it a bit backwards -- you can't pass a path into loads/dumps for that reason. If they were created because that distinction couldn't be made, then load/sump would have accepted a string path back in the day. TBH, I think it would be great to just have .load and .dump read the file > with standard params when a path-like ( hasattr(obj, '__path__') ) is > passed, but the suggested disadvantages of this are: > > - https://docs.python.org/3/library/functions.html#open > > > The default encoding is platform dependent (whatever > locale.getpreferredencoding() returns), but any text encoding supported by > Python can be used. See the codecs module for the list of supported > encodings. > that's not a reason at all -- the reason is that some folks think overloading a function like this is bad API design. And it's been the way it's been for a long time, so probably better to add a new function(s), rather than extend the API of an existing one. > - .load and .dump don't default to UTF8? > AFAIU, they do default to UTF-8. Do they instead currently default to > locale.getpreferredencoding() instead of the JSON spec(s) * > encoding= was removed from .loads and was never accepted by json.load or > json.dump > I think dump defaults to UTF-8. But load is a bit odd (and not that well documented). it appears to accept a file_like object that returns either a string or a byte object from its read() method. If strings, then the decoding is done. if bytes, then I assume that it's using utf-8. This, by the way, should be better documented. > - .load and .dump would also need to accept an encoding= parameter for > non-spec data that don't want to continue handling the file themselves > - pickle.load has an encoding= parameter > .loads doesn't now, so I don't see why they would need to with the proposed change. You can always encode/decode ahead of time however you want, either in the file-like object or by passing decoded str to .loads/dumps. > - Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode) > no, I think that's clear. in fact, you can't currently dump to a binary file: In [26]: json.dump(obj, open('tiny-enc.json', 'wb')) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-26-02e9bcd47a3e> in <module> ----> 1 json.dump(obj, open('tiny-enc.json', 'wb')) ~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 178 # a debuggability cost 179 for chunk in iterable: --> 180 fp.write(chunk) 181 182 TypeError: a bytes-like object is required, not 'str' That's the beauty of Python 3's text model :-) JSON Specs: > - https://tools.ietf.org/html/rfc7159#section-8.1 : > > > JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default > encoding is UTF-8, > So THAT is interesting. But the current implementation does not directly support anything but UTF-8, and I think it's fine that that still be the case. If anyone is using the other two, it's an esoteric case, and they can encode/decode by hand. > So, could we just have .load and .dump accept a path-like and an encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 / UTF-32 anyway)? These are separate questions, but I'll say: Yes, it could take a path-like. But I think there was not much support for that in this discussion. No -- there is no need for encoding parameter -- the other two options are rare and can be done by hand. BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding. A user can encode it any way they want when passing it along. This, in fact, is all very Python3 text model compatible -- the encoding/decoding should happen as close to IO as possible. If there were no backward compatibility options, and it were me, I would only use strings in/out of the json module, but I think that ship has sailed. Anyway -- if anyone wants to push for overloading .load()/dump(), rather than making two new loadf() and dumpf() functions, then speak now -- that will take more discussion, and maybe a PEP. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5BMEZFULCGGQTJHSRN3RIEGB4P3TVGK6/ Code of Conduct: http://python.org/psf/codeofconduct/