[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Greg Ewing

Andrew Barnert wrote:

No you can’t. Decimal accepts strings that aren’t valid as JSON numbers, like
`.3`,


That's not a problem as long as it doesn't serialise them that
way, which it wouldn't:

>>> str(Decimal('.3'))
'0.3'

> or `nan`,

The serialiser could raise an exception in that case.

BTW, I just checked what it does with floats:

>>> json.dumps(float('nan'))
'NaN'

So it seems that it currently doesn't care about strict conformance
here.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OO34LREKZCZXEQ226CY3CO3ZDTGHXS3N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Greg Ewing

Andrew Barnert wrote:

Except that it doesn’t allow that. Using Decimal doesn’t preserve the
difference between 1.E+3 and 1000.0, or between +12 and 12.


That's true. But it does preserve everything that's important for
interpreting it as a numerical value without losing any precision,
which I think is enough of an improvement to recommend having it
as an option.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MPIFT2SW3V5FQPMQCIYKHWWFOVMEAACA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Andrew Barnert via Python-ideas
On Aug 9, 2019, at 17:44, Greg Ewing  wrote:
> 
> Since Python's Decimal preserves all the information in the JSON
> representation of a float (including trailing zeroes!), anything
> else you might want can be achieved by pre/postprocessing the
> decoded data structure.
> 
> It's just occurred to me that this isn't quite true, since
> Decimal doesn't preserve *leading* zeroes.

That one isn’t a problem, because JSON doesn’t allow leading zeroes. If you 
want to use 00.012 vs. 0.012 to say something meaningful about your precision, 
you’d have to encode that meaning in JSON as something like 0.0012E1 vs. 
0.012E0.

Of course Decimal will throw away _that_ distinction, but only for the same 
reason it throws away all of the distinctions between 0.012E0 vs. 0.012 vs 
1.2E-2 vs. 1.2e-2 and so on, so you’re only hitting the everyday problem that 
Decimal can’t solve for anyone; there’s no additional problem that Decimal 
can’t solve here that’s only faced by your weird app.

> But an application
> that cared about that would be pretty weird. So I think a
> "use decimal" option would cover the vast majority of use
> cases.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Y2BPGI5QBIOYWOV42OD73AQZIDYD6INX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Andrew Barnert via Python-ideas
On Aug 9, 2019, at 17:53, Greg Ewing  wrote:
> 
> If you really insist on being strict, it could require you to
> return a special wrapper type that takes a string of digits and
> checks that it conforms to the syntax of a JSON number.
> 
> Come to think of it... you could use Decimal as that wrapper
> type!

No you can’t. Decimal accepts strings that aren’t valid as JSON numbers, like 
`.3`, or `nan`, just as float does and C’s atoi, etc. (And it accepts strings 
that are valid as JSON numbers, but not recommended for interoperability, like 
`1.0E999`.)

If you want to know whether something can be parsed as a JSON number, the 
obvious thing to do is call json.loads on it. (And if you want to know whether 
it’s something that’s “interoperable”, check whether loads returns a float—and 
maybe whether that float is equal to your input.) It’s not exactly the hardest 
thing to parse in the world, but why duplicate the parser (and potentially 
create new bugs) if you already have one?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7FEUBOEK5BFPLL7IJ5BG3XHOZU4R2FWW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Joao S. O. Bueno
On Fri, 9 Aug 2019 at 19:53, Richard Musil  wrote:

> Joao S. O. Bueno wrote:
> > yes,  just that it should be called dump_as_float and take either a
> > class
> > or a tuple-of-classes
>
> I saw kind of symmetry with the `parse_float` which only accepted one
> class for having only one class on the output as well. Besides there are
> probably not many (different) ways how to write a custom type for JSON
> float in one application. But I cannot see any argument for not having a
> tuple.
>
> > (or maybe just another argument that when set to
> > "True" would
> > work for any object for which isinstance(obj, numbers.Number) is True)
>
> I cannot verify it right now, but if integer (or big int) are derived from
> `numbers.Number` than it would not work as a distinction for a float. Big
> int is already handled by standard module correctly.
>

Maybe we are all just over-discussing a lot of things that could be solved
in a straightforward way by allowing decimal.Decimal to be json
encoded as numbers by default, with no drawbacks whatsoever. The fact that
arbitrarily long integers are encoded with no complaints  up
to now seems to indicate so.
[clip]

>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AL7U2NYUVLVWVC4KAMNHB3OROJCYALLX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Andrew Barnert via Python-ideas
On Aug 9, 2019, at 17:16, Greg Ewing  wrote:
> 
> I think it can be justified on the grounds that it allows all of the
> information in the JSON text to be preserved during both deserialisation
> and serialisation.

Except that it doesn’t allow that. Using Decimal doesn’t preserve the 
difference between 1.E+3 and 1000.0, or between +12 and 12. Not to mention 
things like which characters in your strings, and even object keys, were read 
from backslash escapes, or which of your lists have spaces after their commas.

JSON is not canonicalizable or round-trippable, by design. So I don’t think 
giving people the illusion of round-tripping their input is a good thing. 
Especially if smart people who’ve read this thread still don’t get that it’s an 
illusion.

People needing to actually preserve 100% of the JSON that gets thrown at them 
are going to think this does it (because the docs imply it, or some random guy 
on StackOverflow says so, or it passed the couple of unit tests they thought 
of), and then deploy code they relies on that, and only discovering later that 
it’s.broken, and can’t be fixed without changing big chunks of their design.

I mean, the OP wants to use this for secure hashes; think about what kind of 
debugging nightmare that’s likely to lead to. (And I hope nobody actually tries 
to attack him while he’s debugging the secure hash failures that are just side 
effects of this bug…)

The fact that a feature _can_ be misused isn’t a reason to reject it. But the 
fact that a feature will _almost always_* be misused is a different story.

—-

* I say “almost” because there are presumably some cases where being able to 
preserve 98.8% of the input that gets thrown at you is qualitatively better 
than being able to preserve 97.8%, or where the only JSON docs you ever receive 
are just individual numbers, and they’re all between -0.1 and -0.2, so the only 
possible error that can arise is this one. And so on. But I doubt any of those 
is common.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Q2IVUG2VJNK5NUHGRVSBOAIXCERJ6RBP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Greg Ewing

Rhodri James wrote:
I get what you want -- "this string of digits is the 
representation I want to use, please don't put quotes around it" -- but 
I can't help but feel that it will only encourage more unrealistic 
expectations.


I think "consenting adults" applies here. Yes, you could use it to
produce invalid JSON, so it's your responsibility to not do that.
And if you do so accidentally, you'll find out about it when other
things (and quite possibly your own thing) fail to read it.

If you really insist on being strict, it could require you to
return a special wrapper type that takes a string of digits and
checks that it conforms to the syntax of a JSON number.

Come to think of it... you could use Decimal as that wrapper
type!

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BB3PTCSLL6YWDHSTUMOC4G5EC5NSU4R5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Greg Ewing

Joao S. O. Bueno wrote:

Short of a long term solution, like a __json__ protocol, or at least special
support in Python json module for objects of type "numbers.Number", 
the only way to go, is, as you are asking, being able to insert "raw 
strings into json".


No, that's not the only way. It would be sufficient just to add a
"use decimal" option to the stdlib json module.

Since Python's Decimal preserves all the information in the JSON
representation of a float (including trailing zeroes!), anything
else you might want can be achieved by pre/postprocessing the
decoded data structure.

It's just occurred to me that this isn't quite true, since
Decimal doesn't preserve *leading* zeroes. But an application
that cared about that would be pretty weird. So I think a
"use decimal" option would cover the vast majority of use
cases.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GBHJ7QXKQN5ZRHUKDYA52PHBUSOIJF4J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Greg Ewing

Paul Moore wrote:

So you're proposing a change to the Python
language stdlib implementation of that translation. Fine. But you have
yet to provide a justification for such a change,


I think it can be justified on the grounds that it allows all of the
information in the JSON text to be preserved during both deserialisation
and serialisation.

Seems to me this is objectively better. You can always discard
information you don't need, but you can't get it back if you need
it and it's not there.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/S2MDQ4M4CM6MLY2YQ4CUCI263RT7PMAW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Greg Ewing

Richard Musil wrote:

Even the number
0.64417266845703130 (note the last 0) is different JSON object from
0.6441726684570313 (without the last 0).


And Python's Decimal type preserves that distinction:

>>> Decimal("0.64417266845703130")
Decimal('0.64417266845703130')
>>> Decimal("0.6441726684570313")
Decimal('0.6441726684570313')

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RKCGJ7XZVHIKM2THOFPFV4O74WVPTVF4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Proposal: Use "for x = value" to assign in scope

2019-08-09 Thread Dominik Vilsmeier
Peter O'Connor wrote:
> Alright hear me out here:
> I've often found that it would be useful for the following type of
> expression to be condensed to a one-liner:
> def running_average(x_seq):
> averages = []
> avg = 0
> for t, x in enumerate(x_seq):
> avg =  avg*t/(t+1) + x/(t+1)
> averages.append(avg)
> return averages
> Because really, there's only one line doing the heavy lifting here, the
> rest is kind of boilerplate.
> Then I learned about the beautiful and terrible "for x in [value]":
> def running_average(x_seq):
> return [avg for avg in [0] for t, x in enumerate(x_seq) for avg in
> [avg*t/(t+1) + x/(t+1)]]
> Many people find this objectionable because it looks like there are 3 for
> loops, but really there's only one: loops 0 and 2 are actually assignments.

You can solve this via `itertools.accumulate` in a concise and clear way:

[x/n for n, x in enumerate(it.accumulate(x_seq), 1)]

> My Proposal
> What if we just officially bless this "using for as a temporary assignment"
> arrangement, and allow "for x=value" to mean "assign within the scope of
> this for".  It would be identical to "for x in [value]", just more
> readable.  The running average function would then be:
> def running_average(x_seq):
> return [avg for avg=0 for t, x in enumerate(x_seq) for avg = avg *
> t/(t+1) + x / (t+1)]
> -- P.S. 1
> I am aware of Python 3.8's new "walrus" operator, which would make it:
> def running_average(x_seq):
> avg = 0
> return [avg := avg*t/(t+1) + x / (t+1) for t, x in enumerate(x_seq)]
> But it seems ugly and bug-prone to be initializing a in-comprehension
> variable OUTSIDE the comprehension.
> -- P.S. 2
> The "for x = value" syntax can achieve things that are not nicely
> achievable using the := walrus.  Consider the following example (wherein we
> carry forward a "hidden" variable h but do not return it):
> y_seq = [y for h=0 for x in x_seq for y, h = update(x, h)]
> There's not really a nice way to do this with the walrus because you can't
> (as far as I understand) combine it with tuple-unpacking.  You'd have to do
> something awkward like:
> yh = None, 0
> y_seq, _ = zip(*(yh := update(x, yh[1]) for x in x_seq))

You can't use `:=` with tuple unpacking but you can use it with tuples 
directly; this requires a definition of the initial tuple (preferably outside 
the loop) but this is i.m.o. a plus since it clearly marks the initial 
conditions for your algorithm:

yh = (None, 2)  # initial values
[(yh := update(x, yh[1]))[0] for x in x_seq]

If you really ever have the need to carry a variable over into a comprehension 
(which might be a valid thing, for example in a class body) then you can still 
resort to an additional `for` loop (as you've already indicated); after all 
it's not too bad and you can even put the different loops on different lines + 
add a comment if necessary:

class Foo:
a = 1
# this won't work:
b = [x*a for x in range(5)]
# we can carry `a` over to the comprehension as follows:
b = [x*a for a in [a] for x in range(5)]

Using just `for = ...` is not much of a difference, especially since most 
people will see the `for` and immediately assume it's a loop (so in that sense 
it's even more confusing).
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XKMPOMS3QSUURFCI5YDEAPJRGCCQAXBI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Greg Ewing

Stephen J. Turnbull wrote:

Richard Musil writes:

 > After some thinking it seems that the float is only case where this
 > "loss of a precision" happen.

This is true of Unicode normalized forms as well as floats.


But it's true in Python, where you have the option to not normalise
your deserialised JSON strings. What's missing is the ability to
deserialise JSON floats to a non-lossy type.

This seems like a reasonable thing to want, even if you're not
intending to round-trip anything.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YYIEDO6YI75XJPGLVC2XBHYWAWTFJWQH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Exceptions with Message Templates

2019-08-09 Thread Andrew Barnert via Python-ideas
On Aug 9, 2019, at 12:00, Ryan Fox  wrote:
> 
> > If you’re wondering why this works, it’s because Error and InputError don’t 
> > override __new__. Which should make it obvious why a tutorial aimed at 
> > novices doesn’t get into the details, but that’s why Python has reference 
> > manuals instead of just a tutorial.

> Okay, so this __new__/__init__ divide is sort of a hack because you can't 
> trust users to call super().__init__(...) on their user-defined exceptions?

No, It’s not a hack; almost every immutable class that allows mutable 
subclasses in Python has a __new__ that the mutable subclasses rarely override, 
and a do-nothing __init__ that the mutable subclasses do.

This does also save one remove one possibility for novices to make mistakes, 
and it also removes a line of boilerplate for most subclasses, but I don’t 
think either of those is the point; they’re just bonus.

Anyway, you just define a new exception class the way the tutorial shows, and 
you get a round-trippable repr and proper tracebacks and args like magic; 
that’s all a novice needs to know—but it’s something we have to not break for 
them.

> It sounds like BaseException.args is mostly used for generating strings? 
> Couldn't I just get away with overriding __str__, __repr__?

Yes, but that means every class that doesn’t do the usual thing has to instead 
implement two or three extra methods, which is hardly reducing boilerplate, 
which was the goal.

> I understand that existing built-in exception classes have some semantic 
> value associated with arguments in given positions, but that wouldn't apply 
> to new user-defined exceptions. And going forward, if you wanted to make your 
> tool extract a value, it doesn't seem like it should make a big difference to 
> access exc.arg[3] or exc.thing.

As I said, if we were designing a new language from scratch, we’d almost surely 
want exceptions to be more like dataclasses or namedtuples than things that 
store the positional args in a tuple. But we’re obviously not going to change 
all of the builtin exceptions, the hundreds of other exceptions in the stdlib 
and popular libraries, and the thousands in specific projects people are 
working on overnight. Adding named attributes, and structure in general, to 
exceptions is a great goal, but it’s a very long-term goal (that’s been going 
on since 3.0); a new feature that requires us to already be most of the way 
there before it’s useful isn’t as useful as it appears.

Something that preserves the existing magic for args and repr and half of str, 
while adding the other half of str (and maybe named attributes too?), that 
would be a step forward. But adding the their half of str while breaking the 
existing magic (and making named attributes harder rather than easier) doesn’t 
seem like it is.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6JW7FR3N65BHXEUUAMEODHWBRW3DHW4R/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
Joao S. O. Bueno wrote:
> yes,  just that it should be called dump_as_float and take either a
> class
> or a tuple-of-classes

I saw kind of symmetry with the `parse_float` which only accepted one class for 
having only one class on the output as well. Besides there are probably not 
many (different) ways how to write a custom type for JSON float in one 
application. But I cannot see any argument for not having a tuple.

> (or maybe just another argument that when set to
> "True" would
> work for any object for which isinstance(obj, numbers.Number) is True)

I cannot verify it right now, but if integer (or big int) are derived from 
`numbers.Number` than it would not work as a distinction for a float. Big int 
is already handled by standard module correctly.

> is not the role of the language or its libraries
> to prevent any way that the JSON encoded string is valid-json. So, maybe
> emitting a warning there, but

>From the other responses I got an impression that ensuring the validity of the 
>output was important part of the standard implementation. But regardless that, 
>here I believe the check with `float(dump_val)` is actually a check to 
>validate the contract with the custom serializer, which seems reasonable, 
>whether it should be an error or a warning I have no idea. I hope Andrew or 
>Paul could comment on that.

> raising TypeError will only make someone intending to encode numbers in
> Hexadecimal in her  custom JSON to pop
> here crying tomorrow.

I am not sure hexadecimal representation is officially recognized as a number i 
JSON and float number in particular, so in that case she will probably be 
encoding it a string already.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UT2UVSAY7U576ALJWIF7KAJG4TCSQ5KJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Exceptions with Message Templates

2019-08-09 Thread Ryan Fox
> If you’re wondering why this works, it’s because Error and InputError
don’t override __new__. Which should make it obvious why a tutorial aimed
at novices doesn’t get into the details, but that’s why Python has
reference manuals instead of just a tutorial.

Between the links I've found, none of them refer to BaseException.__new__
setting up args, and the references to args make it sound like an optional
thing or something the built-in exceptions handle themselves.

https://docs.python.org/3/library/exceptions.html
https://docs.python.org/3/c-api/exceptions.html
https://docs.python.org/3/tutorial/errors.html

Okay, so this __new__/__init__ divide is sort of a hack because you can't
trust users to call super().__init__(...) on their user-defined exceptions?
No judgement, just trying to wrap my head around this.

It sounds like BaseException.args is mostly used for generating strings?
Couldn't I just get away with overriding __str__, __repr__? I understand
that existing built-in exception classes have some semantic value
associated with arguments in given positions, but that wouldn't apply to
new user-defined exceptions. And going forward, if you wanted to make your
tool extract a value, it doesn't seem like it should make a big difference
to access exc.arg[3] or exc.thing.

I've also looked at giving the formatted message as args[0], but I haven't
figured out how to handle a class with an explicit __init__ that defines
a POSITIONAL_OR_KEYWORD argument, like this:
https://github.com/rcfox/exception-template/blob/master/tests/test.py#L58 The
'key' argument ends up getting thrown in with kwargs. Maybe I just get rid
of the error check for extra arguments given?

> I also just realized that your class won't have any helpful introspection
on the call signature either, so all users of your class will have to
document in the docstring very clearly what is expected in the constructor
call as their code editors won't be able to tell them and help()/inspect
won't be able to help either.

I won't try to convince you that this is a good solution, but
help()/inspect could at least be tricked with a metaclass that replaces the
signature of __init__:

class _Signaturizer(type):
def __new__(cls, name, bases, classdict):
result = super().__new__(cls, name, bases, classdict)
args = set(name for _, name, _, _ in
result.formatter.parse(result.message) if name)
params = [inspect.Parameter(a, inspect.Parameter.KEYWORD_ONLY)
for a in sorted(args)]
result.__init__.__signature__ =
inspect.signature(result.__init__).replace(parameters=params)
return result

class ExceptionTemplate(Exception, metaclass=_Signaturizer):
# ...

Of course, tools that rely on static analysis wouldn't be fooled.


On Fri, Aug 9, 2019 at 1:57 PM Sebastian Kreft  wrote:

>
>
> On Fri, Aug 9, 2019 at 12:55 PM Brett Cannon  wrote:
>
>>
>>
>> On Thu, Aug 8, 2019 at 5:24 PM Sebastian Kreft  wrote:
>>
>>>
>>>
>>> On Thu, Aug 8, 2019 at 7:09 PM Andrew Barnert via Python-ideas <
>>> python-ideas@python.org> wrote:
>>>
 On Aug 8, 2019, at 15:01, Ryan Fox  wrote:

 I don't see why you would want to access arguments by their position.


 Because that’s the way it’s worked since Python 1.x, and there’s tons
 of existing code that expects it, including the default __str__ and
 __repr__ for exceptions and the code that formats tracebacks.

>>> I don't really understand what you mean here. This property was broken
>>> since ImportError started accepting keyword arguments.
>>>
>>
>> The property isn't broken for ImportError, it just isn't being given the
>> keyword arguments because it didn't makes sense to pass them down with no
>> information attached to it. The 'args' attribute still gets the message
>> which is the key detail.
>>
> The "broken" property was alluding to Andrew's comment that exception
> arguments need to be positional and cannot/shouldn't be keywords and I was
> giving an example from the standard library in which we can pass keyword
> arguments which are not stored in the exception's arguments.
>
> Also note that my comment mentioned that passing the formatted message as
> the only argument to the super constructor would solve the compatibility
> problem.
>
>
>>
>> -Brett
>>
>>
>>>
>>> For example:
>>>
>>> >>> ImportError("message", name="name", path="path").args
>>> ('message',)
>>>
>>> >>> ImportError("message", "foo", name="name", path="path").args
>>> ('message', 'foo')
>>>
>>> For the case of str and repr, one could just call super with the
>>> formatted message as the only positional argument.
>>>
>>>
>>> I suggest taking a look at PEP 473
>>>  for ideas on why having
>>> structured arguments is a good idea.
>>>

 The user-defined exceptions in the Python documentation don't pass
 arguments to the base class either:
 

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Joao S. O. Bueno
On Fri, 9 Aug 2019 at 14:49, Richard Musil  wrote:

> Joao S. O. Bueno wrote:
>
> > However, as noted, there is no way to customize Python JSON encoder
> > to encode an arbitrary decimal number in JSON, even though the standard
> does
> > allow it, and Python supports then via decimal.Decimal.
> > Short of a long term solution, like a __json__ protocol, or at least
> special
> > support in Python json module for objects of type "numbers.Number",
> > the only way to go, is, as you are asking, being able to insert "raw
> > strings into json".
>
> Would the approach I outlined in my answer to Dominik be acceptable?:
>
> 1) Add keyword argument to `json.dump(s)` called `dump_float` which will
> act as a counter part to `parse_float` keyword argument in `json.load(s)`.
> The argument will accept custom type (class) for the user's "float"
> representation (for example `decimal.Decimal`).
>

yes,  just that it should be called `dump_as_float` and take either a class
or a tuple-of-classes  (or maybe just another argument that when set to
"True" would
work for any object for which isinstance(obj, numbers.Number) is True)


>
> 2) If specified by the client code, JSONEncoder, when identifying object
> of that type in the input data will encode it using the special rule
> suggested by Dominik:
> ```
> # if o is custom float type
> if isinstance(o, ):
> dump_val = str(o)
> try:
> float(dump_val)
> except ValueError:
> raise TypeError('... is not JSON serializable float number') from
> None
> 
> ```
> This would have following implications/consequences:
> 1) str(o) may return invalid float, but the check will not let it into the
> stream.
> 2) the contract between the custom float class implementation and standard
> `json` module will be pretty clear - it must implement the serialization in
> its __str__ function and must return valid float.
> 3) the standard implementation does not need to `import decimal`. If the
> client code needs this feature, it will `import decimal` itself.
> 4) definition which class/type objects should be handled by this rule will
> be pretty clear, it will be the only one specified in `dump_float` argument
> (if specified at all).
>

Yes, but I don know if the reverse float checking is over-policying it - it
is not the role of the language or its libraries
to prevent any way that the JSON encoded string is valid-json. So, maybe
emitting a warning there, but
raising TypeError will only make someone intending to encode numbers in
Hexadecimal in her  custom JSON to pop
here crying tomorrow.



> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/7HCLSKOKRRPNEBZRQGA7F3BEVLX4GXU6/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OYZ3YDZFC4UO3FU6XCDIUCQOSZSBSIJL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Proposal: Use "for x = value" to assign in scope

2019-08-09 Thread Rhodri James

On 09/08/2019 18:10, Brett Cannon wrote:

On Fri, Aug 9, 2019 at 9:03 AM Peter O'Connor 
wrote:


Alright hear me out here:

I've often found that it would be useful for the following type of
expression to be condensed to a one-liner:

def running_average(x_seq):
 averages = []
 avg = 0
 for t, x in enumerate(x_seq):
 avg =  avg*t/(t+1) + x/(t+1)
 averages.append(avg)
 return averages

Because really, there's only one line doing the heavy lifting here, the
rest is kind of boilerplate.



But it's boilerplate that communicates the starting state of your loop
which is useful to know and to have be very clearly communicated.


+1.  The intent and operation of your code is clear.  Win.


Then I learned about the beautiful and terrible "for x in [value]":

def running_average(x_seq):
 return [avg for avg in [0] for t, x in enumerate(x_seq) for avg in
[avg*t/(t+1) + x/(t+1)]]

Many people find this objectionable because it looks like there are 3 for
loops, but really there's only one: loops 0 and 2 are actually assignments.


I find it objectionable because it's unreadable.  I would reject this in 
a code review as "too clever for its own good," therefore unnecessarily 
hard to maintain.



**My Proposal**

What if we just officially bless this "using for as a temporary
assignment" arrangement, and allow "for x=value" to mean "assign within the
scope of this for".  It would be identical to "for x in [value]", just more
readable.  The running average function would then be:

def running_average(x_seq):
 return [avg for avg=0 for t, x in enumerate(x_seq) for avg = avg *
t/(t+1) + x / (t+1)]



I personally don't find that more readable then the unrolled version you're
trying to avoid. And based on the amount of grief we got for the walrus
operator I wouldn't expect much uptake on this as being considered more
readable by others either. (And remember that "Readability counts").


Agreed.

--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MO4G7T7H7BPWI2KLWDYQJQ7B7XEHJO2J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Exceptions with Message Templates

2019-08-09 Thread Sebastian Kreft
On Fri, Aug 9, 2019 at 12:55 PM Brett Cannon  wrote:

>
>
> On Thu, Aug 8, 2019 at 5:24 PM Sebastian Kreft  wrote:
>
>>
>>
>> On Thu, Aug 8, 2019 at 7:09 PM Andrew Barnert via Python-ideas <
>> python-ideas@python.org> wrote:
>>
>>> On Aug 8, 2019, at 15:01, Ryan Fox  wrote:
>>>
>>> I don't see why you would want to access arguments by their position.
>>>
>>>
>>> Because that’s the way it’s worked since Python 1.x, and there’s tons of
>>> existing code that expects it, including the default __str__ and __repr__
>>> for exceptions and the code that formats tracebacks.
>>>
>> I don't really understand what you mean here. This property was broken
>> since ImportError started accepting keyword arguments.
>>
>
> The property isn't broken for ImportError, it just isn't being given the
> keyword arguments because it didn't makes sense to pass them down with no
> information attached to it. The 'args' attribute still gets the message
> which is the key detail.
>
The "broken" property was alluding to Andrew's comment that exception
arguments need to be positional and cannot/shouldn't be keywords and I was
giving an example from the standard library in which we can pass keyword
arguments which are not stored in the exception's arguments.

Also note that my comment mentioned that passing the formatted message as
the only argument to the super constructor would solve the compatibility
problem.


>
> -Brett
>
>
>>
>> For example:
>>
>> >>> ImportError("message", name="name", path="path").args
>> ('message',)
>>
>> >>> ImportError("message", "foo", name="name", path="path").args
>> ('message', 'foo')
>>
>> For the case of str and repr, one could just call super with the
>> formatted message as the only positional argument.
>>
>>
>> I suggest taking a look at PEP 473
>>  for ideas on why having
>> structured arguments is a good idea.
>>
>>>
>>> The user-defined exceptions in the Python documentation don't pass
>>> arguments to the base class either:
>>> https://docs.python.org/3/tutorial/errors.html#user-defined-exceptions
>>>
>>>
>>> Yes they do. Try it:
>>>
>>> >>> e = InputError('[2+3)', 'mismatched brackets')
>>> >>> e.args
>>> ('[2+3)', 'mismatched brackets')
>>> >>> e
>>> InputError('[2+3)', 'mismatched brackets')
>>>
>>> If you’re wondering why this works, it’s because Error and InputError
>>> don’t override __new__. Which should make it obvious why a tutorial aimed
>>> at novices doesn’t get into the details, but that’s why Python has
>>> reference manuals instead of just a tutorial.
>>>
>>> Also, notice that the tutorial examples don’t even try to create a
>>> formatted message; they expect that the type name and the args will be
>>> enough for debugging. I’m not sure that’s a great design, but it means that
>>> your intended fix only solves a problem they didn’t even have in the first
>>> place.
>>>
>>> So let's go ahead and assume my implementation is flawed. The fact that
>>> people prefer to copy their format strings all over their projects implies
>>> that the current exception scheme is suboptimal. Can we agree on that? If
>>> not, there's no need to continue this discussion.
>>>
>>>
>>> I agree that it would be nice for more people to move their message
>>> formatting into the class, but you need a design that encourages that
>>> without fighting against the fact that exception args are positional, and
>>> I’m not sure what that looks like. And I don’t think it’s your
>>> implementation that’s bad (it seems to do what it says perfectly well), but
>>> that the design doesn’t work.
>>>
>>> Of course if you were designing a new language (or a new library from
>>> builtins up for the same language), this would be easy. Exceptions would
>>> look like (maybe be) @dataclasses, storing their arguments by name and
>>> generating repr/str/traceback that takes that into account, and all of them
>>> would actually store the relevant values instead of half of them making you
>>> parse it out of the first arg, and there would be a special message
>>> property or method instead of args[0] being sort of special but not special
>>> enough, and so on. And then, providing an easier way to create that message
>>> property would be an easy problem. But I think with the way Python is
>>> today, it’s not.
>>>
>>> Of course I’d be happy to be proven wrong on that.  :)
>>> ___
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/5ULPVNH6RBFXY24P76YZCAKIKOLHWF2B/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>> --
>> Sebastian Kreft
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to 

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
Joao S. O. Bueno wrote:

> However, as noted, there is no way to customize Python JSON encoder
> to encode an arbitrary decimal number in JSON, even though the standard does
> allow it, and Python supports then via decimal.Decimal.
> Short of a long term solution, like a __json__ protocol, or at least special
> support in Python json module for objects of type "numbers.Number",
> the only way to go, is, as you are asking, being able to insert "raw
> strings into json".

Would the approach I outlined in my answer to Dominik be acceptable?:

1) Add keyword argument to `json.dump(s)` called `dump_float` which will act as 
a counter part to `parse_float` keyword argument in `json.load(s)`. The 
argument will accept custom type (class) for the user's "float" representation 
(for example `decimal.Decimal`).

2) If specified by the client code, JSONEncoder, when identifying object of 
that type in the input data will encode it using the special rule suggested by 
Dominik:
```
# if o is custom float type 
if isinstance(o, ):
dump_val = str(o)
try:
float(dump_val)
except ValueError:
raise TypeError('... is not JSON serializable float number') from None

```
This would have following implications/consequences:
1) str(o) may return invalid float, but the check will not let it into the 
stream.
2) the contract between the custom float class implementation and standard 
`json` module will be pretty clear - it must implement the serialization in its 
__str__ function and must return valid float.
3) the standard implementation does not need to `import decimal`. If the client 
code needs this feature, it will `import decimal` itself.
4) definition which class/type objects should be handled by this rule will be 
pretty clear, it will be the only one specified in `dump_float` argument (if 
specified at all).
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7HCLSKOKRRPNEBZRQGA7F3BEVLX4GXU6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
Alright, when I made the reference to the json.org I made a mistake. Should 
have made the reference to the RFC. You are right that the RFC makes it clear 
that it is aware of possible interoperability problems and that some "common 
ground" should be acceptable for the underlying representation. 
(https://tools.ietf.org/html/rfc8259#section-6).

I believe my proposal is not going against that. I have already stated in my 
previous mail to Paul that the default behavior (current implementation) which 
uses platform native float binary representation is fine.

Adding the support that custom type may implement it is not the same as asking 
as having it implemented in the standard (default) implementation.

As it turned out during the discussion here (at least for me and few others), 
it seems that asking for the general feature of "being able to insert custom 
(raw) output" is not really necessary and the sufficient would be only give 
this support to `float` type, because this is the only which is both - native 
type for both Python and JSON and at the same time could (and could be useful) 
being handled by custom type.

Concerning for bit-to-bit vs byte-to-byte - no obsession here. It is just that 
the byte is the usual granularity of the (text) stream processors (either in or 
out), so I felt that using byte-to-byte was more fitting.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/US2U3GIH6Y5W2QJZH2ETNCQLM6FCKHYJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Proposal: Use "for x = value" to assign in scope

2019-08-09 Thread Brett Cannon
On Fri, Aug 9, 2019 at 9:03 AM Peter O'Connor 
wrote:

> Alright hear me out here:
>
> I've often found that it would be useful for the following type of
> expression to be condensed to a one-liner:
>
> def running_average(x_seq):
> averages = []
> avg = 0
> for t, x in enumerate(x_seq):
> avg =  avg*t/(t+1) + x/(t+1)
> averages.append(avg)
> return averages
>
> Because really, there's only one line doing the heavy lifting here, the
> rest is kind of boilerplate.
>

But it's boilerplate that communicates the starting state of your loop
which is useful to know and to have be very clearly communicated.


>
> Then I learned about the beautiful and terrible "for x in [value]":
>
> def running_average(x_seq):
> return [avg for avg in [0] for t, x in enumerate(x_seq) for avg in
> [avg*t/(t+1) + x/(t+1)]]
>
> Many people find this objectionable because it looks like there are 3 for
> loops, but really there's only one: loops 0 and 2 are actually assignments.
>
> **My Proposal**
>
> What if we just officially bless this "using for as a temporary
> assignment" arrangement, and allow "for x=value" to mean "assign within the
> scope of this for".  It would be identical to "for x in [value]", just more
> readable.  The running average function would then be:
>
> def running_average(x_seq):
> return [avg for avg=0 for t, x in enumerate(x_seq) for avg = avg *
> t/(t+1) + x / (t+1)]
>

I personally don't find that more readable then the unrolled version you're
trying to avoid. And based on the amount of grief we got for the walrus
operator I wouldn't expect much uptake on this as being considered more
readable by others either. (And remember that "Readability counts").

-Brett


>
> -- P.S. 1
> I am aware of Python 3.8's new "walrus" operator, which would make it:
>
> def running_average(x_seq):
> avg = 0
> return [avg := avg*t/(t+1) + x / (t+1) for t, x in enumerate(x_seq)]
>
> But it seems ugly and bug-prone to be initializing a in-comprehension
> variable OUTSIDE the comprehension.
>
> -- P.S. 2
> The "for x = value" syntax can achieve things that are not nicely
> achievable using the := walrus.  Consider the following example (wherein we
> carry forward a "hidden" variable h but do not return it):
>
> y_seq = [y for h=0 for x in x_seq for y, h = update(x, h)]
>
> There's not really a nice way to do this with the walrus because you can't
> (as far as I understand) combine it with tuple-unpacking.  You'd have to do
> something awkward like:
>
> yh = None, 0
> y_seq, _ = zip(*(yh := update(x, yh[1]) for x in x_seq))
> --
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/RHW5AUV3C57YOF3REB2HEMYLWLLXSNQT/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/J7FQP2KPODR4LA5K7HCLYU547O7CHHEV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Exceptions with Message Templates

2019-08-09 Thread Brett Cannon
On Thu, Aug 8, 2019 at 11:38 AM Ryan Fox  wrote:

> Thanks for the comments.
>
> I'm not really sure what you mean with regards to backwards compatibility.
> Would it suffice to have ExceptionTemplate.__init__ accept *args and pass
> that into super().__init__? I see that BaseException does something with
> args in __new__ and __init__, but I'm not very familiar with the Python
> internals. How does BaseException deal with kwargs?
>

It doesn't deal with them because it won't accept them. :)


>
> str.format_map() would make the formatting simpler, but I also use
> string.Formatter.parse() to in ExceptionTemplate._error_check() to find the
> required arguments. I don't see an alternative for this on str.
>

You can also use str.format(**kwargs) if you want to be strict about what
you accept in terms of keyword arguments while being generic in your
implementation.

I also just realized that your class won't have any helpful introspection
on the call signature either, so all users of your class will have to
document in the docstring very clearly what is expected in the constructor
call as their code editors won't be able to tell them and help()/inspect
won't be able to help either.


>
> You're right about the third thing. I hadn't even considered the books! I
> have released on PyPI as 'exception-template' and we'll see what happens.
> One person commented on Reddit that they liked the idea but would probably
> copy the code directly into their project rather than require an additional
> external dependency.
>

Yep, this is small enough to easily fit into a person's personal toolbox of
handy code.


>
> I guess I was hoping to get a feel of whether this was something that
> appealed to this audience and/or could ever be accepted, as well as to
> refine the idea and implementation.
>
> On Thu, Aug 8, 2019 at 2:03 PM Brett Cannon  wrote:
>
>> Three things. One, this isn't backwards-compatible as you are not passing
>> any details down into Exception.__init__() to make sure that
>> BaseException.args gets populated.
>>
>> Two, you can simplify your code by using str.format_map() instead of the
>> string module.
>>
>> Three, I don't see enough overhead from version 2 to your version 3 since
>> you're only saving a line of code so say there's that much boilerplate. But
>> the best way to prove me wrong is to release this on PyPI and see if people
>> use it. But without community use proving people want this we won't be up
>> for adding a new built-in exception as you're asking every book on Python
>> to be rewritten to cover this which is no small thing.
>>
>> On Thu, Aug 8, 2019 at 8:56 AM Ryan Fox  wrote:
>>
>>> Exception definitions in Python are not great. There are two main ways
>>> they're used in the code that I've read:
>>>
>>> 1) By far the most common:
>>>
>>> >>> class MyException(Exception):
>>> ... pass
>>> >>> raise MyException(f'Bad thing happened during {action} in {context}')
>>>
>>> 2) Much rarer:
>>>
>>> >>> class MyException(Exception):
>>> ... def __init__(self, action, context):
>>> ... super().__init__(f'Bad thing happened during {action} in
>>> {context}')
>>> >>> raise MyException(current_action, current_context)
>>>
>>> Version 1 is quick and easy, but messages skew and become inconsistent
>>> as they're copied from place to place. The Python standard library isn't
>>> immune to this either.
>>>
>>> Version 2 looks simple enough to do, but all of the repetitious
>>> boilerplate adds up when several exception types need to be defined. (And
>>> it's even worse when you want all of your code to include typing
>>> annotations.) Most people don't bother.
>>>
>>> My proposal is a new exception class as the preferred base for
>>> user-defined exceptions:
>>>
>>> >>> class MyException(ExceptionTemplate):
>>> ...message = 'Bad thing happened during {action} in {context}'
>>> >>> raise MyException(action=current_action, context=current_context)
>>>
>>> I have a reference implementation, implemented in an almost trivial
>>> amount of pure Python code:
>>> https://github.com/rcfox/exception-template/blob/master/exception_template/exception_template.py
>>>
>>> So why am I bothering you with this? I would like to see this become the
>>> preferred method of defining new exceptions as recommended by the Python
>>> documentation. This would also require adding ExceptionTemplate as a
>>> built-in exception type.
>>>
>>> I will grant that ExceptionTemplate seems a little bit magical, but all
>>> of the required arguments are explicitly laid out to the user in the
>>> 'message' field, and format strings should be a familiar concept to most
>>> users.
>>>
>>> I've also included some tests that show that you can still treat these
>>> exceptions as regular classes:
>>> https://github.com/rcfox/exception-template/blob/master/tests/test.py#L53
>>>
>>> ___
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to 

[Python-ideas] Re: Proposal: Use "for x = value" to assign in scope

2019-08-09 Thread Andrew Barnert via Python-ideas
On Aug 9, 2019, at 08:47, Peter O'Connor  wrote:
> 
> I've often found that it would be useful for the following type of expression 
> to be condensed to a one-liner: 
> 
> def running_average(x_seq): 
> averages = []
> avg = 0
> for t, x in enumerate(x_seq): 
> avg =  avg*t/(t+1) + x/(t+1)
> averages.append(avg)
> return averages
> 
> Because really, there's only one line doing the heavy lifting here, the rest 
> is kind of boilerplate.

It seems like the only reason you can’t write this with accumulate is that 
accumulate doesn’t take a start value like reduce does?

And I think this would be a lot clearer and more readable, especially if you’re 
doing this kind of thing more than once:

def running(xs, func, start):
yield from accumulate(enumerate(xs), lambda avg, told: func(avg, *tx), 
start)

def running_average(xs, func):
yield from running(xs, lambda avg, t, x: avg*t/(t+1) + x/(t+1), 0.0)

Now the part that does the heavy lifting is all in one place and just does what 
it’s says, without being confusingly interleaved with the boilerplate. Plus, 
the parts of the boilerplate that are reusable are abstracted into functions 
(accumulate and running) that can be reused, while the rest of it has vanished.

(This is one of those rare cases where 2.x-style decomposing def/lambda was 
actually useful, but if that extra lambda in running really bothers you, that’s 
another one-liner HOF you can abstract out trivially and reuse.)

More generally, it’s a lot easier to use comprehensions and higher order 
functions if your algorithm can be written in terms of “generate the next 
immutable value” instead of “update the mutable variable”, and I don’t think 
that’s a limitation of the language. Comprehensions are much more readable when 
they’re declarative than when they’re for statements in disguise. 

Also, I don’t think the reason people were objecting to your four-clause 
comprehension was that it wasn’t easy enough to tell that the innermost clause 
only “loops” exactly one time, but that it’s a comprehension with four clauses 
in the first place. Changing the spelling of that clause to make the 
no-actual-looping doesn’t solve that.

Finally, you can already play tricks with the walrus operator to avoid moving 
things like initialization outside the comprehension, just as you could with 
your proposed syntax. For example, “for t, x in (avg:=0) or enumerate(xs)” is a 
perfectly valid clause that assigns 0 to avg and then loops over the enumerate, 
and it doesn’t require you to turn one for clause into two. But it still adds 
just as much complexity for the reader to deal with, so I think you’re still 
better off not doing it.

(As a side note, you probably want a numerical stable average like the ones in 
statistics or numpy, rather than one that accumulates float rounding errors 
indiscriminately, but that’s another issue.)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3RPVRDFALMRMYHWYLCP4EPTOI4LKQIZY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Exceptions with Message Templates

2019-08-09 Thread Brett Cannon
On Thu, Aug 8, 2019 at 5:24 PM Sebastian Kreft  wrote:

>
>
> On Thu, Aug 8, 2019 at 7:09 PM Andrew Barnert via Python-ideas <
> python-ideas@python.org> wrote:
>
>> On Aug 8, 2019, at 15:01, Ryan Fox  wrote:
>>
>> I don't see why you would want to access arguments by their position.
>>
>>
>> Because that’s the way it’s worked since Python 1.x, and there’s tons of
>> existing code that expects it, including the default __str__ and __repr__
>> for exceptions and the code that formats tracebacks.
>>
> I don't really understand what you mean here. This property was broken
> since ImportError started accepting keyword arguments.
>

The property isn't broken for ImportError, it just isn't being given the
keyword arguments because it didn't makes sense to pass them down with no
information attached to it. The 'args' attribute still gets the message
which is the key detail.

-Brett


>
> For example:
>
> >>> ImportError("message", name="name", path="path").args
> ('message',)
>
> >>> ImportError("message", "foo", name="name", path="path").args
> ('message', 'foo')
>
> For the case of str and repr, one could just call super with the formatted
> message as the only positional argument.
>
>
> I suggest taking a look at PEP 473
>  for ideas on why having
> structured arguments is a good idea.
>
>>
>> The user-defined exceptions in the Python documentation don't pass
>> arguments to the base class either:
>> https://docs.python.org/3/tutorial/errors.html#user-defined-exceptions
>>
>>
>> Yes they do. Try it:
>>
>> >>> e = InputError('[2+3)', 'mismatched brackets')
>> >>> e.args
>> ('[2+3)', 'mismatched brackets')
>> >>> e
>> InputError('[2+3)', 'mismatched brackets')
>>
>> If you’re wondering why this works, it’s because Error and InputError
>> don’t override __new__. Which should make it obvious why a tutorial aimed
>> at novices doesn’t get into the details, but that’s why Python has
>> reference manuals instead of just a tutorial.
>>
>> Also, notice that the tutorial examples don’t even try to create a
>> formatted message; they expect that the type name and the args will be
>> enough for debugging. I’m not sure that’s a great design, but it means that
>> your intended fix only solves a problem they didn’t even have in the first
>> place.
>>
>> So let's go ahead and assume my implementation is flawed. The fact that
>> people prefer to copy their format strings all over their projects implies
>> that the current exception scheme is suboptimal. Can we agree on that? If
>> not, there's no need to continue this discussion.
>>
>>
>> I agree that it would be nice for more people to move their message
>> formatting into the class, but you need a design that encourages that
>> without fighting against the fact that exception args are positional, and
>> I’m not sure what that looks like. And I don’t think it’s your
>> implementation that’s bad (it seems to do what it says perfectly well), but
>> that the design doesn’t work.
>>
>> Of course if you were designing a new language (or a new library from
>> builtins up for the same language), this would be easy. Exceptions would
>> look like (maybe be) @dataclasses, storing their arguments by name and
>> generating repr/str/traceback that takes that into account, and all of them
>> would actually store the relevant values instead of half of them making you
>> parse it out of the first arg, and there would be a special message
>> property or method instead of args[0] being sort of special but not special
>> enough, and so on. And then, providing an easier way to create that message
>> property would be an easy problem. But I think with the way Python is
>> today, it’s not.
>>
>> Of course I’d be happy to be proven wrong on that.  :)
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/5ULPVNH6RBFXY24P76YZCAKIKOLHWF2B/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Sebastian Kreft
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZOLTXVQPUFWFJXO66JT7JEP2BMLVC5OF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Andrew Barnert via Python-ideas
On Aug 9, 2019, at 07:25, Richard Musil  wrote:
> 
> There is no "normalized" representation for JSON. If you look at the 
> "standard" it is pretty simple (json.org). The JSON object is defined solely 
> by its textual representation (string of characters).

json.org is not the standard; RFC 8259 is the standard.* The description at 
json.org is not just informal and incomplete, it’s wrong about multiple things. 
For example, it assumes all Unicode characters can be escaped with \u, and 
that JSON strings are a subset of JS strings despite allowing two characters 
the JS doesn’t, and that its float format is the same as C’s except for octal 
and hex when it isn’t.

. The RFC nails down all the details, fixes all of the mistakes, and, most 
relevant here, makes recommendations about what JSON implementations should do 
if they care about interoperability. None of these are "MUST" recommendations, 
so you can still call an implementation conforming if it ignores them, but it’s 
still not a good idea to ignore them.**

And it’s not just your two examples that are different representations of the 
same float value that implementations should treat as (approximately, to the 
usual limits of IEEE) equal. So are 1.0 and +1E8 and 1.0e+08. If you’ve 
received the number 1E8 and write it back, you’re going to get 1.0. 
Storing it in Decimal instead of float doesn’t magically fix that. The fact 
that it does fix the one example you’ve run into so far doesn’t mean that it 
guarantees byte-for-byte*** round-tripping, just that it happens to give you 
byte-for-byte round-tripping in that one example. Arguing that we must allow it 
because we have to allow people to guarantee byte-for-byte round-tripping is 
effectively arguing that we have to actively mislead people into thinking we’re 
making a guarantee that we can’t make.

And even if it did solve the problem for numbers, you’d still have the problem 
of different choices for which characters to escape, different Unicode 
normalizations in strings, different order of object members, different 
handling for repeated keys, different separator whitespace in arrays and 
objects, and so on. All of these are just as much a problem for round-tripping 
a different library’s JSON as they are for generating the same thing from 
scratch. 

Other text-based formats solve the hashing problem by specifying a canonical 
form: you can’t guarantee that any implementation can round-trip another 
implementation’s output, but you can guarantee that any implementation that 
claims to support the canonical form will produce the same canonical form for 
the same inputs, so you just hash the canonical form. But JSON intentionally 
does not have a canonical form. The intended way to solve the hashing problem 
is to not use JSON. 

Finally, all of your proposals for solving this so far either allow people to 
insert arbitrary strings into JSON, or don’t work. For example, using bytes 
returns from default to signal “insert these characters into the stream instead 
of encoding this value” explicitly lets people insert anything they want, 
even as you say you don’t want to allow that. I don’t get why you aren’t just 
proposing that the stdlib adopt simplejson’s use_decimal flag (and trying to 
figure out a way to make that work) instead of trying to invent something new 
and more general and more complicated even though it can’t actually be used for 
anything but decimal. But again, use_decimal does not solve the round-tripping, 
canonicalization, or hashing problem at all, so if your argument is that we 
should include it to solve that problem, you need a new argument.

—-

* Python, and I believe simplejson, still conforms to RFC 7159, which was the 
standard until December 2017, so if you wanted to make an argument from 7159, 
that _might_ be valid, although I think if there were a significant difference, 
people would be more likely to take that as a reason to update to 8259.

** And in fact, even the informal description doesn’t say what you want. It 
defines numbers are “very much like a C or Java number”. C and Java numbers are 
explicitly representations of the underlying machine type (for C) or of IEEE 
binary64 (for Java).

*** I don’t get why you’re obsessed with the question of bit-for-bit vs. 
byte-for-byte. Are you worried about behavior on an ancient PDP-7 where you 
have 9-bit bytes but ignore the top bit for characters or something?

 And why would bytes ever mean “exactly these Unicode characters”? That’s 
what str means; bytes doesn’t, and in fact can’t unless you have an 
out-of-band-specified encoding; that’s half the reason we have Python 3. And 
even if you did confusingly specify that bytes can be used for “exactly the 
characters these bytes decode to as UTF-8”, that still doesn’t specify what 
happens if the bytes has, say, a newline, or a backslash followed by an a, or a 
non-BMP Unicode character. The fact that you weren’t expecting to 

[Python-ideas] Proposal: Use "for x = value" to assign in scope

2019-08-09 Thread Peter O'Connor
Alright hear me out here:

I've often found that it would be useful for the following type of
expression to be condensed to a one-liner:

def running_average(x_seq):
averages = []
avg = 0
for t, x in enumerate(x_seq):
avg =  avg*t/(t+1) + x/(t+1)
averages.append(avg)
return averages

Because really, there's only one line doing the heavy lifting here, the
rest is kind of boilerplate.

Then I learned about the beautiful and terrible "for x in [value]":

def running_average(x_seq):
return [avg for avg in [0] for t, x in enumerate(x_seq) for avg in
[avg*t/(t+1) + x/(t+1)]]

Many people find this objectionable because it looks like there are 3 for
loops, but really there's only one: loops 0 and 2 are actually assignments.

**My Proposal**

What if we just officially bless this "using for as a temporary assignment"
arrangement, and allow "for x=value" to mean "assign within the scope of
this for".  It would be identical to "for x in [value]", just more
readable.  The running average function would then be:

def running_average(x_seq):
return [avg for avg=0 for t, x in enumerate(x_seq) for avg = avg *
t/(t+1) + x / (t+1)]

-- P.S. 1
I am aware of Python 3.8's new "walrus" operator, which would make it:

def running_average(x_seq):
avg = 0
return [avg := avg*t/(t+1) + x / (t+1) for t, x in enumerate(x_seq)]

But it seems ugly and bug-prone to be initializing a in-comprehension
variable OUTSIDE the comprehension.

-- P.S. 2
The "for x = value" syntax can achieve things that are not nicely
achievable using the := walrus.  Consider the following example (wherein we
carry forward a "hidden" variable h but do not return it):

y_seq = [y for h=0 for x in x_seq for y, h = update(x, h)]

There's not really a nice way to do this with the walrus because you can't
(as far as I understand) combine it with tuple-unpacking.  You'd have to do
something awkward like:

yh = None, 0
y_seq, _ = zip(*(yh := update(x, yh[1]) for x in x_seq))
--
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RHW5AUV3C57YOF3REB2HEMYLWLLXSNQT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Joao S. O. Bueno
So -
I think that is clear by now that in your case you really are not loosing
any precision with these numbers.

However, as noted, there is no way to customize Python JSON encoder
to encode an arbitrary decimal number in JSON, even though the standard does
allow it, and Python supports then via decimal.Decimal.

Short of a long term solution, like a __json__ protocol, or at least special
support in Python json module for objects of type "numbers.Number",
the only way to go, is, as you are asking, being able to insert "raw
strings into json".

Given that this feature can be needed now, I fashioned a JsonEncoder class
that is able to
do that - by annotating decimal.Decimal instances on encoding, and making
raw string replacements before returning the final encoded value.

This recipe is ready to be used at
https://gist.github.com/jsbueno/5f5d200fd77dd1233c3063ad6ecb2eee
(Note that I don't consider this approach fit for the stdlib due to having
to rely on regular expressions,
and having to create a copy of the whole encoded json-body - if there is
demand,
I might package it though).


Please enjoy.

  js
 -><-

On Thu, 8 Aug 2019 at 07:27, Richard Musil  wrote:

> I have found myself in an awkward situation with current (Python 3.7) JSON
> module. Basically it boils down to how it handles floats. I had been hit on
> this particular case:
>
> In [31]: float(0.6441726684570313)
> Out[31]: 0.6441726684570312
>
> but I guess it really does not matter.
>
> What matters is that I did not find a way how to fix it with the standard
> `json` module. I have the JSON file generated by another program (C++ code,
> which uses nlohmann/json library), which serializes one of the floats to
> the value above. Then when reading this JSON file in my Python code, I can
> get either decimal.Decimal object (when specifying
> `parse_float=decimal.Decimal`) or float. If I use the latter the least
> significant digit is lost in deserialization.
>
> If I use Decimal, the value is preserved, but there seems to be no way to
> "serialize it back". Writing a custom serializer:
>
> class DecimalEncoder(json.JSONEncoder):
> def default(self, o):
> if isinstance(o, decimal.Decimal):
> return str(o) # <- This becomes quoted in the serialized output
> return super.default(o)
>
> seems to only allow returning "string" value, but then serializes it as a
> string! I.e. with the double quotes. What seems to be missing is an ability
> to return a "raw textual representation" of the serialized object which
> will not get mangled further by the `json` module.
>
> I noticed that `simplejson` provides an explicit option for its standard
> serializing function, called `use_decimal`, which basically solves my
> problem., but I would just like to use the standard module, I guess.
>
> So the question is, if something like `use_decimal` has been considered
> for the standard module, and if yes, why it was not implemented, or the
> other option could be to support "raw output" in the serializer, e.g.
> something like:
> class DecimalEncoder(json.JSONEncoder):
> def raw(self, o):
> if isinstance(o, decimal.Decimal):
> return str(o) # <- This is a raw representation of the object
> return super.raw(o)
> Where the returning values will be directly passed to the output stream
> without adding any additional characters. Then I could write my own Decimal
> serializer with few lines of code above.
>
> If anyone would want to know, why the last digit matters (or why I cannot
> double quote the floats), it is because the file has a secure hash attached
> and this basically breaks it.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/WT6Z6YJDEZXKQ6OQLGAPB3OZ4OHCTPDU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TY6YCFTH5IBRSVOQQX5NDOBEWJSPKAFI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Rhodri James

On 09/08/2019 16:05, Joao S. O. Bueno wrote:

I spent some minutes now trying to encode a Decimal as a JSON "Number" using
Python native encoder - it really is not possible. The level of
customization
for Python encoders just allows a method ("default") that have to return
a "known" object type - and if it returns a string, it is included with
quotes
in the final output - which defeats writting numbers.


I still need some persuasion that this is not the right behaviour as it 
stands.  I get what you want -- "this string of digits is the 
representation I want to use, please don't put quotes around it" -- but 
I can't help but feel that it will only encourage more unrealistic 
expectations.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PULMVJB2HU5KITPWFUTMFRKMLNOTCTZC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Joao S. O. Bueno
On Thu, 8 Aug 2019 at 18:53, Chris Angelico  wrote:

> On Fri, Aug 9, 2019 at 6:31 AM Richard Musil  wrote:
> >
> > Chris Angelico wrote:
> >
> > > 2) Should there be a protocol obj.__json__() to return a string
> > > representation of an object for direct insertion into a JSON file?
> >
> > > However, this is a much broader topic, and if you want to push for
> > > that, I would recommend starting a new thread. As Andrew pointed out,
> > > trying to get bit-for-bit identical JSON representations out of
> > > different encoders is usually a bad idea.
> >
> > I am not sure I have ever asked for bit-for-bit identical JSON
> representation. I have always only mentioned `decimal.Decimal` and the lack
> of proper way to encode it (while having the proper way of decoding it),
> and if you read the subject of the OP it is asking for "raw output" (in the
> encoder, nothing about underlying representation) which if I understand
> your two options basically corresponds to the second one and is probably
> addressed elsewhere far more thoroughly.
> >
>
> If you're checking hashes, you need it to be bit-for-bit identical.
> But if what you REALLY want is for a Decimal to be represented in JSON
> as a native number, then I think that is a very reasonable feature
> request. The question is, how should it be done? And there are
> multiple viable options.
>
> I'd recommend, rather than requesting a way to create raw output, that
> you request a way to either (a) recognize Decimals in the default JSON
> encoder, both the Python and C implementations thereof; or (b) create
> a protocol such as __json__ to allow any object to choose how it
> represents itself. You'll probably find a reasonable level of support
> for either of those suggestions.
>
>
I spent some minutes now trying to encode a Decimal as a JSON "Number" using
Python native encoder - it really is not possible. The level of
customization
for Python encoders just allows a method ("default") that have to return
a "known" object type - and if it returns a string, it is included with
quotes
in the final output - which defeats writting numbers.

So - there is clearly the need for more customization capabilities, and a
__json__ protocol (allowing one to return an already serialized string,
with quotes,
if needed, included in the serialization) seems to be  good way to go.
There is no need
to change Python object for including a "__json__" slot - it is just
Python's json encoder (and
3rd parties) that would need to check for that.


I am here calling that we settle for that - (I think this would need a PEP,
right?) .

   js
 -><-


> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/4KXUIVHWGY3MIB32ZLKFZQXNDU65OV34/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6YKFHO624ALHJWHY5Z42CKF5G4ED27CP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

2019-08-09 Thread Neil Girdhar
The documentation is beautiful.

One of the features I was looking for when evaluating parsers was the 
ability to run expression on the rules.  For example if you matching
"\begin{\w*}"
and \w* turns out to be "enumeration", then later when you match
"\end{\w*}"
then you want to check that that \w* is also enumeration or else raise an 
error.

A similar thing happens when you're trying to parse Python source code with 
the indentation level.  You might want to check that the next indentation 
level is the same or corresponds to a dedent.

Expressions on rules should be able to control whether something matches, 
should be able to store values in the parse tree, to store values that can 
be read by other expressions, and be able to raise parsing errors.

The beauty of expressions is that you can do the parsing and build the AST 
in one shot.  If you've ever looked at the Python source code, it is 
unfortunate that those tasks have to be done separately even though most 
changes to the AST require parsing changes.

The most modern parsing algorithms have this.  The old parsing libraries 
(lex/yacc, flex/bison, antlr) were very limited.

Also, I'm not sure the separation between tokenization and parsing is 
necessary if you're not worried about efficiency.

Best,

Neil

On Monday, July 15, 2019 at 9:45:59 PM UTC-4, Nam Nguyen wrote:
>
> Hello list,
>
> I sent an email to this list two or three months ago about the same idea. 
> In that discussion, there were both skepticism and support. Since I had 
> some time during the previous long weekend, I have made my idea more 
> concrete and I thought I would try with the list again, after having run it 
> through some of you privately.
>
> GOAL: To have some parsing primitives in the stdlib so that other modules 
> in the stdlib itself can make use of. This would alleviate various security 
> issues we have seen throughout the years.
>
> With that goal in mind, I opine that any parsing library for this purpose 
> should have the following characteristics:
>
> #. Can be expressed in code. My opinion is that it is hard to review 
> generated code. Code review is even more crucial in security contexts.
>
> #. Small and verifiable. This helps build trust in the code that is meant 
> to plug security holes.
>
> #. Less evolving. Being in the stdlib has its drawback that is development 
> velocity. The library should be theoretically sound and stable from the 
> beginning.
>
> #.  Universal. Most of the times we'll parse left-factored context-free 
> grammars, but sometimes we'll also want to parse context-sensitive grammars 
> such as short XML fragments in which end tags must match start tags.
>
> I have implemented a tiny (~200 SLOCs) package at 
> https://gitlab.com/nam-nguyen/parser_compynator that demonstrates 
> something like this is possible. There are several examples for you to have 
> a feel of it, as well as some early benchmark numbers to consider. This is 
> far smaller than any of the Python parsing libraries I have looked at, yet 
> more universal than many of them. I hope that it would convert the skeptics 
> ;).
>
> Finally, my request to the list is: Please debate on: 1) whether we want a 
> small (even private, underscore prefixed) parsing library in the stdlib to 
> help with tasks that are a little too complex for regexes, and 2) if yes, 
> how should it look like?
>
> I also welcome comments (naming, uses of operator overloading, features, 
> bikeshedding, etc.) on the above package ;).
>
> Thanks!
> Nam
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MVULCAFJDAY4HHT7P6X4DCTW3HHEBF6T/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Paul Moore
On Fri, 9 Aug 2019 at 15:20, Richard Musil  wrote:
>
> JSON objects are specified by their textual representation (json.org). 
> Choosing some binary representation for them (so they can be processed 
> efficinetly) which does not preserve their value is the problem of the 
> underlying binary representation, not of the JSON format per se.

>From ECMA-404, linked from that page:

The goal of this specification is only to define the syntax of valid
JSON texts. Its intent is not to provide any semantics or
interpretation of text conforming to that syntax. It also
intentionally does not define how a valid JSON text might be
internalized into the data structures of a programming language. There
are many possible semantics that could be applied to the JSON syntax
and many ways that a JSON text can be processed or mapped by a
programming language. Meaningful interchange of information using JSON
requires agreement among the involved parties on the specific
semantics to be applied. Defining specific semantic interpretations of
JSON is potentially a topic for other specifications

So yes, how JSON is translated into language data structures is out of
the scope of the JSON spec. So you're proposing a change to the Python
language stdlib implementation of that translation. Fine. But you have
yet to provide a justification for such a change, except the original
background description, which you yourself have repeatedly claimed is
irrelevant, of getting identical output from a JSON->internal->JSON
round trip.

So as far as I can see we're left with "please change the stdlib json
module, because I think this behaviour would be better (oh, and
incidentally it would solve an issue I have that's not relevant to my
argument that the module should change)".

> From the JSON point of view 0.6441726684570313 is perfectly valid float (or 
> better say _number_ as it is what JSON uses) and 0.6441726684570312 is 
> perfectly valid _and different_ number, because it differs in the last digit.

Well, technically, JSON doesn't claim they are different, because it
doesn't define comparison between its "number" objects... As you point
out later in the same post, "3.0" and "3." are different in the
sense that you define, but how is that relevant or helpful? The JSON
spec adds no semantics, so language bindings get to do what they want.
So you're proposing a change to the Python language bindings.

We get that. (At least I think most of us do by now). But I'm not sure
you're getting the point that you need to justify your proposal, and
you can't (by your own argument) do so by reference to the JSON spec,
and you aren't (by your own admission) doing so based on round
tripping. So what is your justification for wanting this change? "It
makes more sense" doesn't seem to be getting much traction with people
here. "Decimal offers better accuracy" also seems not to be very
persuasive.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LUS2IPA4LTAELKGWY7RBTGXHTCST4KGP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
There is no "normalized" representation for JSON. If you look at the "standard" 
it is pretty simple (json.org). The JSON object is defined solely by its 
textual representation (string of characters).

The way how different parsers choose to represent it in the binary form, so 
they can process it, is an implementation detail, but JSON format neither 
stipulate any particular (binary) representation or that all have to be same.

>From the JSON point of view 0.6441726684570313 is perfectly valid float (or 
>better say _number_ as it is what JSON uses) and 0.6441726684570312 is 
>perfectly valid _and different_ number, because it differs in the last digit.

The fact that both numbers transform into the same value when represented in 
IEEE-754 floating point number format is the feature of this particular binary 
representation, and has nothing to do with the JSON itself. The underlying JSON 
parser may as well choose a different representation and preserve an arbitrary 
precision (for example decimal.Decimal).

>From the JSON point of view there is no ambiguity, nor doubt. Even the number 
>0.64417266845703130 (note the last 0) is different JSON object from 
>0.6441726684570313 (without the last 0). Yes both represent the same value and 
>if used in calculations will give the same results, but they are different 
>JSON objects.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QIJXFSNQR2YG7TKBURNVF5WIRDJMXNKG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
JSON objects are specified by their textual representation (json.org). Choosing 
some binary representation for them (so they can be processed efficinetly) 
which does not preserve their value is the problem of the underlying binary 
representation, not of the JSON format per se.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XDTJSUNTLUCXHCVRB6KMGB5W34ZDPSW5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Rhodri James

On 08/08/2019 21:18, Richard Musil wrote:

2) Use JSON decoder to decode it (hopefully without losing anything in the process) and 
then dump it into "normalized" form and compute the hash over this one. This 
has the risk of conversion error, but if I could avoid that risk by using a custom type 
which does not have such an error, it would be much easier and maintenable solution.


This is the part that everyone is saying JSON itself does not guarantee 
will work.  You cannot, by its very nature, produce such a custom type.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IH43GPQ26Q4MKT4A2OWDLSRCTICKDD3E/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Stephen J. Turnbull
Richard Musil writes:

 > After some thinking it seems that the float is only case where this
 > "loss of a precision" happen.

I don't think so.  What's happening here is not restricted to "loss of
precision".

This can happen for any type that has multiple JSON representations of
the same internal value, and therefore you cannot roundtrip from JSON
representation to the same JSON representation for all of them.

This is true of Unicode normalized forms as well as floats.  According
to Unicode these are entirely interchangable.  If it's convenient for
the Unicode process to change normalization form, according to Unicode
you will get the same *characters* out that you fed in, but you will
not necessarily get byte-for-byte equality.  (This bites me all the
time on the Mac, when I write to a file named in NFC and the Mac fs
converts to NFD.)

Python doesn't do any normalization by default, but it's an inherent
feature of Unicode, and there's no (good) way for the codec to know
that it's happening.

As far as I can see, any of the proposals requires the cooperating
systems to coordinate on a nonstandard JSON dialect[1], and for them to
be of practical use, they'll need to have similarly capable internal
representations -- which means the programs have to be designed for
this cooperation.  Pick an internal representation for float (probably
Decimal), pick an external (JSON) normalization for Unicode (probably
NFC), write the programs to ensure those representations -- using
conformant JSON, and hope there are no compound types with multiple
JSON representations.


Footnotes: 
[1]  I write "nonstandard dialect" rather than "non-conformant"
because you can serialize floats as '{"Decimal" : 1.0}', and the
internal processor just needs to know that such a dict should be
automatically converted to Decimal('1.0').
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XRNAN23ZM5TBJCL2PBYWUE2RHNH6COW5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
On the second though though, what you propose should work right, if:

1) json.dumps will be extended to accept `dump_float` keyword argument, which 
would accept a custom class (for example decimal.Decimal, but could the same 
one which was specified with `parse_float`).

2) Then the serializer when seeing this particular type will do the string 
converson of the object, do the check for valid float as you suggest (as it is 
supposed to be dumping float after all) and then dumps the string into the 
output _but without the double quotes_.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HDKPBLPIADDLVLNVT363QEQRHUWUGQJ4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
@Dominik, I believe your proposal (if I understand it correctly) will break it 
for people, who are already serializing the decimal.Decimal into the string 
(for lack of better support).
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K3UV6U6GK3MFLTSTIZLK2TIF5M25DFBV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Richard Musil
Please, let me explain once more what want.

I need a JSON codec (serializer-deserializer), which can preserve its input on 
its output. After some thought it seems to be really concerning only the 
floats. (The attempt to generalize this should be regarded as secondary - just 
suggestion subject to comments).

What you, Andrew and possibly Chris have probably understood is that I was 
asking for a JSON codec which would ensure the same bit to bit (or byte to 
byte) output with any other JSON codec out there. This never was my point, as, 
in my first post, in this thread, I already gave an example of two different 
serializers which give different results and invalidate directly such a 
request. And, as Chris pointed out in one of his replies, it is simply matter 
of choice and none is either "right" or "wrong".

Second, the fact that the JSON codec (I am looking for) should be able to 
preserve its inputs in its outputs byte-to-byte does not necessarily mean that 
the (default) standard implementation must do that. As, again using the same 
example I gave above, I am fine with default implementation, which chooses 
platform native binary float for the JSON float representation and is as such 
subject to the consequences already discussed here.

The only thing what my "request" requires from the standard module is to allow 
serialize decimal.Decimal or, in broader scope, some custom type in a way that 
it can preserve its input in its output byte-to-byte. And this and only this is 
exactly the kind of byte-to-byte precision I need from the implementation. Or 
better say I need an adequate support from the standard implementation that it 
should be possible.

After thinking it through, it seems to me that the current implementation with 
JSONEncoder.default override satisfies this requirement for any custom type, 
except decimal.Decimal, because of how it handles the custom value provided by 
custom serializer and because decimal.Decimal is in fact a representation of 
the native JSON type (float).

There is no way to serialize the float, which was deserialized into 
decimal.Decimal as a float again and have the exactly same value (literally) in 
the output as was in the input.

Serializing it into custom map or a string might acceptable for some other 
application (though one would wonder why to do that to something which is by 
nature a native JSON type with perfectly valid representation of its own), but 
is not acceptable in my case for reasons stated above.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7HTES3EHGB2NMDTSGLN5HZCBYGX5FB74/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Dominik Vilsmeier
@Richard I suggest you open a new thread to discuss support for dumping 
`Decimal` as a number in JSON.

I agree that allowing arbitrary strings to be inserted in the JSON is not a 
good idea, but apart from the `Decimal` issue I can't think of any other case 
that can't be solved via `JSONEncoder.decode`.

However there is an asymmetry with respect to parsing / dumping numbers. The 
JSON specs don't limit numbers to any precision and the `json` module can parse 
numbers of arbitrary precision. Python further supports arbitrary precision 
through `Decimal` but any JSON dump is limited to the underlying binary `float` 
representation. As the OP indicates one can parse `0.6441726684570313` to 
`Decimal` (preserving the precision) but there's not way to dump it back. This 
seems like a limitation of the `json` implementation.

For dumping `Decimal` without importing the `decimal` module, since this type 
seems to be the only exception, can't we just dump any non-default type as 
`str(o)` and check whether it can be parsed back to float:

# if not any of the default types 
dump_val = str(o)
try:
float(dump_val)
except ValueError:
raise TypeError('... is not JSON serializable') from None
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3EHFAFM5CT7F42PQ35B7HZYJKA4KXVCF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: adding support for a "raw output" in JSON serializer

2019-08-09 Thread Paul Moore
On Thu, 8 Aug 2019 at 22:31, Richard Musil  wrote:
>
> I have not asked for means to serialize invalid JSON objects. Yes, with the 
> "raw" output, you can create invalid JSON, but it does not mean you have to.

True. But my point was simply that the json module appears to be
designed in a way that protects against the possibility of ending up
with invalid JSON, and than seems like a reasonable design principle.
I'm not arguing that there should not ever be such a feature, just
that in the absence of a need for it (see below), designing for safety
seems like a good choice.

> Let's take a look at it from a different POV and focus on the original 
> problem. Imagine this situation, I have a JSON string with this value:
> ```
> msg2 = '{"val": 
> 1.1}'
> ```
> This is perfectly valid JSON representation of the float. I can parse it with 
> standard module with default float handling and get this:
> ```
> json:orig = {"val": 
> 1.1}
> json:pyth = {'val': 1.0}
> json:seri = {"val": 1.0}
> ```
> i.e Python chooses to represent it as 1.0 (which I guess is the closest to 
> the original value) and then serialize it as such into the output. But I can 
> use `parse_float=decimal.Decimal` option of the standard module and get this 
> (with the custom encoder encoding it into the string):
> ```
> dson:orig = {"val": 
> 1.1}
> dson:pyth = {'val': 
> Decimal('1.1')}
> dson:seri = {"val": 
> "1.1"}
> ```
> There is nothing wrong with the float in the original JSON and there is 
> nothing wrong with representing that float with decimal.Decimal type either.

OK, and so far there's nothing that I would describe as a "problem".

> What is missing is just corresponding encoder support. I want to be able to 
> serialize the Decimal object into its JSON float form, not into a string or 
> some custom map.

OK, so you're saying that this is "the original problem". Fine, but my
response would be that without a reasonable use case for this, it's
just how the json module works, and not a "problem". It's only a
problem if someone wants to do something specific, and can't.

The only use case you have given for needing this capability is to
produce identical output when round tripping from JSON to objects and
back to JSON. But whenever people have pushed back on the validity of
this requirement, you've said that's not the point. So OK, if it's not
the point, where's your use case that makes the lack of the encoder
support you're requesting a problem?

Basically you can't have it both ways - either persuade people that
your use case (identical output) is valid, or present a different use
case.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KNMM55HTFTEMOUFJU5DB26VCVFBFKURZ/
Code of Conduct: http://python.org/psf/codeofconduct/