On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.tur...@gmail.com> wrote:

> json.load and json.dump already default to UTF8 and already have
> parameters for json loading and dumping.
>

yes, of course.

json.loads and json.dumps exist only because there was no way to
> distinguish between a string containing JSON and a file path string.
> (They probably should've been .loadstr and .dumpstr, but it's too late for
> that now)
>

I think they exist because that was the pickle API from years ago -- though
maybe that's why the pickle API had them. Though I think you have it a bit
backwards -- you can't pass a path into loads/dumps for that reason. If
they were created because that distinction couldn't be made, then load/sump
would have accepted a string path back in the day.

TBH, I think it would be great to just have .load and .dump read the file
> with standard params when a path-like ( hasattr(obj, '__path__') ) is
> passed, but the suggested disadvantages of this are:
>
> - https://docs.python.org/3/library/functions.html#open
>
>   > The default encoding is platform dependent (whatever
> locale.getpreferredencoding() returns), but any text encoding supported by
> Python can be used. See the codecs module for the list of supported
> encodings.
>

that's not a reason at all -- the reason is that some folks think
overloading a function like this is bad API design. And it's been the way
it's been for a long time, so probably better to add a new function(s),
rather than extend the API of an existing one.


> - .load and .dump don't default to UTF8?
>   AFAIU, they do default to UTF-8. Do they instead currently default to
> locale.getpreferredencoding() instead of the JSON spec(s) *
>   encoding= was removed from .loads and was never accepted by json.load or
> json.dump
>

I think dump defaults to UTF-8. But load is a bit odd (and not that well
documented).

it appears to accept a file_like object that returns either a string or a
byte object from its read() method. If strings, then the decoding is done.
if bytes, then I assume that it's using utf-8.

This, by the way, should be better documented.


> - .load and .dump would also need to accept an encoding= parameter for
> non-spec data that don't want to continue handling the file themselves
>   - pickle.load has an encoding= parameter
>

.loads doesn't now, so I don't see why they would need to with the proposed
change. You can always encode/decode ahead of time however you want, either
in the file-like object or by passing decoded str to .loads/dumps.


> - Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode)
>

no, I think that's clear. in fact, you can't currently dump to a binary
file:

In [26]: json.dump(obj, open('tiny-enc.json', 'wb'))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-02e9bcd47a3e> in <module>
----> 1 json.dump(obj, open('tiny-enc.json', 'wb'))

~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp,
skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators,
default, sort_keys, **kw)
    178     # a debuggability cost
    179     for chunk in iterable:
--> 180         fp.write(chunk)
    181
    182

TypeError: a bytes-like object is required, not 'str'

That's the beauty of Python 3's text model :-)

JSON Specs:
> - https://tools.ietf.org/html/rfc7159#section-8.1  :
>
>   > JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
>    encoding is UTF-8,
>

So THAT is interesting. But the current implementation does not directly
support anything but UTF-8, and I think it's fine that that still be the
case. If anyone is using the other two, it's an esoteric case, and they can
encode/decode by hand.

> So, could we just have .load and .dump accept a path-like and an
encoding= parameter (because they need to be able to specify UTF-8 / UTF-16
/ UTF-32 anyway)?

These are separate questions, but I'll say:

Yes, it could take a path-like. But I think there was not much support for
that in this discussion.

No -- there is no need for encoding parameter -- the other two options are
rare and can be done by hand.

BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding.
A user can encode it any way they want when passing it along.

This, in fact, is all very Python3 text model compatible -- the
encoding/decoding should happen as close to IO as possible.

If there were no backward compatibility options, and it were me, I would
only use strings in/out of the json module, but I think that ship has
sailed.

Anyway -- if anyone wants to push for overloading .load()/dump(), rather
than making two new loadf() and dumpf() functions, then speak now -- that
will take more discussion, and maybe a PEP.

-CHB



-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5BMEZFULCGGQTJHSRN3RIEGB4P3TVGK6/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to