[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Andrew Barnert via Python-ideas Tue, 13 Aug 2019 14:54:04 -0700

On Aug 13, 2019, at 11:21, Chris Angelico <ros...@gmail.com> wrote:
> 
> On Wed, Aug 14, 2019 at 3:12 AM Andrew Barnert via Python-ideas
> <python-ideas@python.org> wrote:
>> 
>>> On Aug 13, 2019, at 01:04, Richard Musil <risa20...@gmail.com> wrote:
>>> 
>>> Concerning the custom separators, and why the implementation supports them, 
>>> I believe it is not actually about the separators themselves, but the 
>>> whitespaces around them. When you generate JSON with an indentation, you 
>>> probably also want some spacing after double colon or comma so it looks 
>>> "pretty".
>> 
>> Ignoring safety, it might be a bit simpler to use, but probably a tiny bit 
>> slower...
> 
> Something to bear in mind is that the JSON module is often called upon
> to deal with gobs of data.


Sure. Replacing the separator strings with whitespace strings would mean that 
generating each array element in dumps now takes one extra single-character 
string emit for the comma (and each object element takes two). I suspect this 
would have negligible impact on the C implementation, but maybe not on the pure 
Python one. And at any rate, if that were a serious proposal, I’d certainly 
benchmark it rather than guessing. But that would be a silly change to make to 
the module at this point, so I’m not proposing it. We’ve lived with separators 
for all these years, and I don’t think it’s a serious wart that needs fixing.

> Just last night I was showcasing a
> particular script that has to load a 200MB JSON file mapping Twitch
> emote names to their emote IDs, and it takes a solid 5-6 seconds to
> parse it. A small slowdown or speedup can have significant impact on
> real-world programs.

I’ve never had to parse a 200MB JSON doc, but I have had to parse a massive 
JSONlines doc with zillions of 1KB JSON docs, which is not nearly as bad for 
memory use, but just as bad for parse time. At that point, it’s worth looking 
into other JSON packages outside the stdlib, unless you’re only doing it once.

> That's one of the reasons that a simple solution of "make JSONEncoder
> respect decimal.Decimal" was rejected - it would require that the json
> module import decimal, which is extremely costly.

To be fair, your program only imports json once, and so does mine, and the 
linear-in-size-of-doc parsing cost isn’t affected by the import time. Still, 
there will be someone out there who runs a script zillions of times on a bunch 
of separate JSON docs, and for that someone, import time will matter.

But I think the lazy-import-decimal-on-first-dump-with-use_decimal solves that, 
and solves it even better than __json__, even besides the fact that it’s a 
better API than exposing “dump raw text into any JSON, and it’s up to you to 
get it right”.

No import time if you’re not using it, just setting a global to None. Even if 
you are using it, the cost of importing it from the sys.modules cache is pretty 
tiny. (After all, you won’t have any Decimal objects without having imported 
decimal, unless you do some nasty tricks—at which point monkeypatching json to 
fake the import isn’t any nastier.)

And the code within the exporter itself should be unaffected. You don’t need to 
check for Decimal until after all the other types have failed, so on successful 
dumps without use_decimal there’s no cost at all, and on failed dumps it’s just 
one extra check before raising. And even when you need use_decimal, the 
isinstance is faster than a getattr or special method lookup to find a __json__ 
method.

But of course you’d still want to actually implement and benchmark it to be 
sure. Maybe the code for that extra check, even if you never reach it, pushes 
an inner loop out of the cache or something; who knows?

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3MM6ZBBEYPPKO3KTIWR4ZGZLY7GTIQIJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Reply via email to