On Aug 13, 2019, at 11:21, Chris Angelico <ros...@gmail.com> wrote: > > On Wed, Aug 14, 2019 at 3:12 AM Andrew Barnert via Python-ideas > <python-ideas@python.org> wrote: >> >>> On Aug 13, 2019, at 01:04, Richard Musil <risa20...@gmail.com> wrote: >>> >>> Concerning the custom separators, and why the implementation supports them, >>> I believe it is not actually about the separators themselves, but the >>> whitespaces around them. When you generate JSON with an indentation, you >>> probably also want some spacing after double colon or comma so it looks >>> "pretty". >> >> Ignoring safety, it might be a bit simpler to use, but probably a tiny bit >> slower... > > Something to bear in mind is that the JSON module is often called upon > to deal with gobs of data.
Sure. Replacing the separator strings with whitespace strings would mean that generating each array element in dumps now takes one extra single-character string emit for the comma (and each object element takes two). I suspect this would have negligible impact on the C implementation, but maybe not on the pure Python one. And at any rate, if that were a serious proposal, I’d certainly benchmark it rather than guessing. But that would be a silly change to make to the module at this point, so I’m not proposing it. We’ve lived with separators for all these years, and I don’t think it’s a serious wart that needs fixing. > Just last night I was showcasing a > particular script that has to load a 200MB JSON file mapping Twitch > emote names to their emote IDs, and it takes a solid 5-6 seconds to > parse it. A small slowdown or speedup can have significant impact on > real-world programs. I’ve never had to parse a 200MB JSON doc, but I have had to parse a massive JSONlines doc with zillions of 1KB JSON docs, which is not nearly as bad for memory use, but just as bad for parse time. At that point, it’s worth looking into other JSON packages outside the stdlib, unless you’re only doing it once. > That's one of the reasons that a simple solution of "make JSONEncoder > respect decimal.Decimal" was rejected - it would require that the json > module import decimal, which is extremely costly. To be fair, your program only imports json once, and so does mine, and the linear-in-size-of-doc parsing cost isn’t affected by the import time. Still, there will be someone out there who runs a script zillions of times on a bunch of separate JSON docs, and for that someone, import time will matter. But I think the lazy-import-decimal-on-first-dump-with-use_decimal solves that, and solves it even better than __json__, even besides the fact that it’s a better API than exposing “dump raw text into any JSON, and it’s up to you to get it right”. No import time if you’re not using it, just setting a global to None. Even if you are using it, the cost of importing it from the sys.modules cache is pretty tiny. (After all, you won’t have any Decimal objects without having imported decimal, unless you do some nasty tricks—at which point monkeypatching json to fake the import isn’t any nastier.) And the code within the exporter itself should be unaffected. You don’t need to check for Decimal until after all the other types have failed, so on successful dumps without use_decimal there’s no cost at all, and on failed dumps it’s just one extra check before raising. And even when you need use_decimal, the isinstance is faster than a getattr or special method lookup to find a __json__ method. But of course you’d still want to actually implement and benchmark it to be sure. Maybe the code for that extra check, even if you never reach it, pushes an inner loop out of the cache or something; who knows? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3MM6ZBBEYPPKO3KTIWR4ZGZLY7GTIQIJ/ Code of Conduct: http://python.org/psf/codeofconduct/