json.load and json.dump already default to UTF8 and already have parameters for json loading and dumping.
json.loads and json.dumps exist only because there was no way to distinguish between a string containing JSON and a file path string. (They probably should've been .loadstr and .dumpstr, but it's too late for that now) TBH, I think it would be great to just have .load and .dump read the file with standard params when a path-like ( hasattr(obj, '__path__') ) is passed, but the suggested disadvantages of this are: - https://docs.python.org/3/library/functions.html#open > The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings. - .load and .dump don't default to UTF8? AFAIU, they do default to UTF-8. Do they instead currently default to locale.getpreferredencoding() instead of the JSON spec(s) * encoding= was removed from .loads and was never accepted by json.load or json.dump - .load and .dump would also need to accept an encoding= parameter for non-spec data that don't want to continue handling the file themselves - pickle.load has an encoding= parameter - marshal.load does not have (and probably doesn't need?) an encoding= parameter - What if you need to specify parameters for the file context manager? Accepting a path-like object should not break any existing code: you could always still open and close a file-like yourself. open('file', 'rb') as _file: json.load(_file) - Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode) JSON Specs: - https://tools.ietf.org/html/rfc7159#section-8.1 : > JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32). Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error. - https://www.json.org/ > http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf (PDF!) > JSON syntax describes a sequence of Unicode code points. JSON also depends on Unicode in the hex numbers used in the \u escapement notation So, could we just have .load and .dump accept a path-like and an encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 / UTF-32 anyway)? On Tue, Sep 15, 2020 at 3:22 AM Stephen J. Turnbull < turnbull.stephen...@u.tsukuba.ac.jp> wrote: > Joao S. O. Bueno writes: > > > If .load and .dump are super-charged, people coding with these > > methods in mind have _one_ less_ thing to worry about: if the > > method accepts a path or an open file becomes irrelevant. > > But then you either lose the primary benefit of this three line > function (defaulting to the UTF-8 encoding to conform to the JSON > standard), or you have a situation where what encoding you get can > depend on whether you use the name of a file or that file already > opened. > > I consider that worse because it's precisely the kind of thing that > people *don't* worry about and *do* have some difficulty debugging. > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/KO3ZZNTDMFZD26QGPTSNEXP2ALRDWOMF/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DHRTCSYINOZFVBYOQZ4CKFS5ZHDUSIZ3/ Code of Conduct: http://python.org/psf/codeofconduct/