Re: [Python-ideas] Shuffled

2016-09-06 Thread Tim Peters
On Tue, Sep 6, 2016 at 2:48 PM, David Mertz  wrote:
> On Tue, Sep 6, 2016 at 1:15 PM, Sven R. Kunze  wrote:
>>
>> Their response: "Oh, I don't need it, let's close it."
>> Arek: "But I need it."
>
>
> This definitely feels like a case of "put it on PyPI."  Actually, maybe
> contribute to `boltons`, it feels like it might fit as a utility function
> there.
>
> While I wouldn't mind being able to type `from random import shuffled`, I
> don't have a bit problem instead typing `from boltons import shuffled`, nor
> even `from arek_utils import shuffled`.
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Shuffled

2016-09-06 Thread Tim Peters
[Sven R. Kunze ]
> ...
> He already convinced some people. Just not some venerable Python devs, which
> doesn't necessarily mean something at all.
>
> Their response: "Oh, I don't need it, let's close it."
> Arek: "But I need it."
>
> So, who's right now?

By default, the venerable Python devs ;-)

As I said in my comments on the issue report, I searched through all
my code for instances of `shuffle()`, and didn't find any that could
be improved by `shuffled()`.  Nor could I dream up a compelling use
case.  So it wasn't just a knee-jerk "oh, I don't need it", the "I
don't need it" was the outcome of some research and thought.

In contrast, the original request was a bare "But I need it", without
so much as a single concrete use case to justify it ("testing" is an
application area, not a concrete use case - I've made extensive use of
shuffling in testing too, but - as I said - found no existing code
where `shuffled()` would have helped).

In the absence of compelling concrete use cases, new ideas have
approximately no chance of being adopted.  Otherwise the language &
libraries bloat in counterproductive ways.  Just ask any venerable
Python dev ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Shuffled

2016-09-06 Thread Tim Peters
[Sven R. Kunze ]
>>>
>>> ...
>>> He already convinced some people. Just not some venerable Python devs,
>>> which>> doesn't necessarily mean something at all.
>>>
>>> Their response: "Oh, I don't need it, let's close it."
>>> Arek: "But I need it."
>>>
>>> So, who's right now?

[Tim]
>> By default, the venerable Python devs ;-)

[Sven]
> Well, little bit arrogant? I hope technical arguments weigh more than status
> here. ;)

I haven't seen any technical arguments here.  You snipped the parts of
my reply where I emphasized the need for concrete use cases.  In the
absence of anything approaching that, it's a matter of taste, and then
- yes - decades of Python development experience do -  and should -
outweigh a newcomer's wish list.  Especially when, as appears to be
the case here, those with truly extensive experience agree.


> ...
> But I remember using it in interactive sessions (damn, why didn't I check
> that into git?). And there I remember finding it quite irritating not having
> a return value at all.
>
> >>> random.shuffle([1,2,3,4])
>
> Just gives me: nothing.
>
> Of course, the period of irritation lasted short and reading __doc__ helped
> a lot, but, well, I could have saved me some time. You get the idea.

No, I don't.  It's an obvious feature of Python's list object design
that mutating methods always return None.  Which means "obvious" after
someone has learned the language, not obvious the first time they try
it.  StackOverflow is equally full of newbie complaints that, e.g.,

some_list.sort().append(100)

gives an "incomprehensible"

AttributeError: 'NoneType' object has no attribute 'append'

error.  Once "ah, mutating methods generally return None" sinks in,
they never post any complaint like that again.

I would be far more annoyed if, e.g.,

>>> random.shuffle(some_million_element_list)

swamped my terminal with mountains of output.

You can't both behaviors simultaneously, so the status quo wins.
Indeed, the venerable status quo ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Shuffled

2016-09-06 Thread Tim Peters
[David Mertz ]
> This definitely feels like a case of "put it on PyPI."  Actually, maybe
> contribute to `boltons`, it feels like it might fit as a utility function
> there.

It's trivial to write such a function if it's truly needed - it would
be easier to write it from scratch than to remember which module it's
hiding in.  There's a "clever" 1-liner, but with no imagination at all
it's still dead obvious:

def shuffled(xs):
from random import shuffle
xs = xs[:]
shuffle(xs)
return xs

`boltons` doesn't typically bother with screamingly obvious things.


> While I wouldn't mind being able to type `from random import shuffled`, I
> don't have a bit problem instead typing `from boltons import shuffled`, nor
> even `from arek_utils import shuffled`.

But would you _use_ it?  I'm still asking for use cases.

When, e.g., I'm randomizing permutations for testing, the last thing I want is:

while whatever:
result = function_of_xs(shuffled(xs))
check result and complain if it's wrong

Why not?  Because no trace remains of _which_ permutation provoked the
failure when a failure occurs.  Instead code looks like this:

while whatever:
shuffle(xs)
result = function_of_xs(xs)  # or xs[:] if the function mutates its arg
check result and complain that `xs` specifically provoked a failure

Indeed, the only clear "use case" that makes sense I've been able to
think of is:

xs = shuffled(xs)

But that's written more easily and efficiently today as

shuffle(xs)

This is in stark contrast to sorted(), where clear use cases abound.
That didn't get in because it's hard to mimic (it's basically as easy
as shuffled()), but because it's exactly what's wanted in all kinds of
contexts in all kinds of code.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Shuffled

2016-09-07 Thread Tim Peters
[Sven R. Kunze ]
> I am not questioning experience which everyone in a team can benefit from.
>
>
> BUT experienced devs also need to recognize and respect the fact that
> younger/unexperienced developers are just better in detecting
> inconsistencies and bloody work-arounds. They simply haven't had to live
> with them for so long. Experienced devs just are stuck in a rut/are
> routine-blinded: "we've done that for years", "there's no better way".
>
>
> That's the way we do it in our teams. Employing the new guys as some sort of
> inconsistency detectors. This way, they learn to find their way around the
> code base and they can improve it by doing so.
>
> And I would never allow it in my team, to dismiss this kind of observation
> from new colleagues. It's invaluable as they will become routine-blinded as
> well.

I have been more than willing to discuss it, and I did not close the
issue report.  I did say I was opposed to it, but that's simply
because I am, and I explained there too _why_ I was opposed.

Do you have anything to say about the specific proposal?  I doubt
either of us has found this meta-discussion useful.  I'm still looking
for a compelling use case.  The only concrete thing anyone has noted
in `shuffled()`'s favor so far is that sometimes they're surprised by
the behavior of random.shuffle(list) returning None in an interactive
shell (noted by you, and by another, and I cheerfully own up to being
a bit surprised by that too long ago).   But that's an observation
about `random.shuffle()`, not about the proposed `shuffled()`.


>> [...] I would be far more annoyed if, e.g.,
>>
>> >>> random.shuffle(some_million_element_list)
>>
>>  swamped my terminal with mountains of output.

> But you readily accept this behavior for "sorted"? That makes no sense at
> all.

Of course it does.  The only analogy to random.shuffle(big_list)
returning None that makes a lick of sense here is that big_list.sort()
also returns None.  IF a `shuffled()` function is introduced, then of
course it should return its result - just like `sorted()` returns its
result.


>> You can't both behaviors simultaneously, so the status quo wins.
>> Indeed, the venerable status quo ;-)

> Nobody said to change "shuffle".

A verbatim quote from the first message in this thread:

"Also shuffle() should return self so mutating methods could be chained."
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-11 Thread Tim Peters
[redirected from python-dev, to python-ideas;
 please send followups only to python-ideas]

[Elliot Gorokhovsky ]
> ...
> TL;DR: Should I spend time making list.sort() detect if it is sorting
> lexicographically and, if so, use a much faster algorithm?

It will be fun to find out ;-)

As Mark, and especially Terry, pointed out, a major feature of the
current sort is that it can exploit many kinds of pre-existing order.
As the paper you referenced says, "Realistic sorting problems are
usually far from random."  But, although they did run some tests
against data with significant order, they didn't test against any
algorithms _aiming_ at exploiting uniformity.  Just against their
radix sort variants, and against a quicksort.

That's where it's likely next to impossible to guess in advance
whether radix sort _will_ have a real advantage.  All the kinds of
order the current sort can exploit are far from obvious, because the
mechanisms it employs are low-level & very general.  For example,
consider arrays created by this function, for even `n`:

def bn(n):
r = [None] * n
r[0::2] = range(0, n//2)
r[1::2] = range(n//2, n)
return r

Then, e.g.,

>>> bn(16)
[0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15]

This effectively takes range(n), cuts it in half, and does a "perfect
shuffle" on the two halves.

You'll find nothing in the current code looking for this as a special
case, but it nevertheless sorts such arrays in "close to" O(n) time,
and despite that there's no natural run in the input longer than 2
elements.

That said, I'd encourage you to write your code as a new list method
at first, to make it easiest to run benchmarks.  If that proves
promising, then you can worry about how to make a single method
auto-decide which algorithm to use.

Also use the two-array version.  It's easier to understand and to
code, and stability is crucial now.  The extra memory burden doesn't
bother me - an array of C pointers consumes little memory compared to
the memory consumed by the Python objects they point at.

Most of all, have fun! :-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] [Python-Dev] Drastically improving list.sort() for lists of strings/ints

2016-09-12 Thread Tim Peters
[Elliot Gorokhovsky ]
> Wow, Tim himself!

And Elliot himself!  It's a party :-)


> Regarding performance on semi-ordered data: we'll have to
> benchmark to see, but intuitively I imagine radix would meet Timsort
> because verifying that a list of strings is sorted takes Omega(nw)
> (which gives a lower bound on Timsort), where w is the word length.

Don't take that stuff so seriously.  The number of character
comparisons needed when comparing two strings is _typically_ small.
Consecutive corresponding characters are compared only until the first
non-equal pair is found.  For example, if the alphabet is 256
characters, for random strings it only requires one character
comparison 255 of 256 times on average (because there's only 1 chance
in 256 that the first characters _are_ equal).


> Radix sort is Theta(nw). So at least asymptotically it checks out.

Analogously, most-significant-byte-first radix sort stops on the pass
after the longest common prefix is processed.  But the real devils are
in the details.  More on that below.


> I think if one uses the two-array algorithm, other semi-sortings can also
> be exploited, since the items get placed into their respective buckets
> in the order in which they appear in the list. So, for the example you gave,
> one pass would sort it correctly

Can't know that unless you specify which algorithm you have in mind.
If you're talking about the one in the paper, _any_ sequence with
values all from range(256) will be sorted in the first pass, because
each value would get its own bucket.


> (since the list has the property if x_1 and x_2 are in bucket b, x1 comes
> before x2 in the list, so x1 will also come before x2 in the bucket. Except
> possibly for one "border bucket" that includes n/2). And then
> it would just be Theta(nw/b) in each bucket to verify sorted.

I don't think you want to go there.  If you're keen on radix sorting,
stick to radix sorting until it's as good as you can make it.  "To
verify sorted" requires the (potentially) arbitrarily expensive kinds
of comparisons you're trying to _eliminate_.  That's not saying
there's no possible advantage from doing some of each; it's warning
that pursuing every off-target idea that comes up will leave you
spinning in circles.


> I mean honestly the cool thing about radix is that the best case for
> Timsort on strings, Omega(nw), is the worst case for radix!

"Details":  I wrote a prototype radix sort along the lines of the
paper's 2-array version, in Python.  When there's "a real" O()
improvement to be had, one can usually demonstrate it by a Python
program incorporating the improvement even though it's fighting
against (unimproved) C code.  For example, I helped someone here craft
a Python radix sort for integers that runs faster than list.sort(),
although it required a list over 10,000,000 elements and a radix of at
least 4096 ;-)


http://stackoverflow.com/questions/20207791/pushing-radix-sort-and-python-to-its-limits

However, I'm not having that kind of luck with strings.  Here's a
detail that bites:  as above, string comparisons _usually_ get out
fast, after comparing just a few characters.  So to construct a bad
case for list.sort() in this respect, I want lots of strings with a
long common prefix.  For example, suppose we have a list of a million
strings each of which starts with "A"*2 (20 thousand "A").  String
comparisons are then very expensive (each one requires at least 20001
character comparisons to resolve).

But that also makes the radix sort excruciatingly slow!  On the first
pass, it copies the million elements, puts them all in the same
bucket, then copies them all back to exactly where they started from.
Pass 2, exactly the same, but looking at the 2nd occurrence of "A" in
each string.  Ditto for pass 3, pass 4, and ... pass 20,000.  By then
we've copied a million elements 40,000 times - but still have exactly
the same array we started with.  The data-copying costs are killing it
(yes, it's only copying pointers, but 40 billion pointer copies aren't
cheap  - and on my 64-bit box, amounts to copying 320 billion bytes).

On the same data, list.sort() is doing a great many character
comparisons, but doing _far_ less pointer copying, and that leaves it
running circles around the radix sort - list.sort() can't copy
pointers more than about N*log2(N) times, and log2(100) is much
smaller than 40,000.

To be complete, I'll attach the code I used.  Note that the paper is
only concerned with "C strings":  0 bytes are special, acting as
string terminators.  0 bytes aren't special in Python, so we need 257
buckets.  I'm also using Python 3, where strings are always Unicode,
and code points (`ord()` results) can be as large as 1,114,111.  So
this radix sort can't work directly on Python 3 strings; instead it
expects a list of `bytes` (or `bytearray`) objects.

def rsort(xs):
count = [0] * 257
bindex = [None] * 257
stack = [(0, 

Re: [Python-ideas] Py_SIZE of PyLongs

2016-10-19 Thread Tim Peters
[Elliot Gorokhovsky ]
> I'm working on a special-case compare function for bounded integers for the
> sort stuff. By looking at the implementation, I figured out that Py_SIZE of
> a long is the sign times the number of digits (...right?).
> ...

Please ignore the other reply you got - they clearly aren't familiar
with the code.

The details are explained in Include/longintrepr.h.

In short, an integer _is_ based on PyVarObject.  Py_SIZE is a macro
that merely extracts (or allows to set) the ob_size member, the sign
of the int is stored as the sign of ob_size (which is really an abuse
of ob_size's intended meaning), and the number of "digits" is the
absolute value of ob_size.  And ob_size is 0 if and only if the int is
0.

Note that the number of bits per digit varies across platforms.  That
too is all explained in longintrepr.h.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed

2016-10-17 Thread Tim Peters
[Sven R. Kunze ]
> Indeed. I also didn't know about that detail of reversing. :) Amazing. (Also
> welcome to the list, Alireza.)

It follows from what the docs say, although I'd agree it may be
helpful if the docs explicitly spelled out this consequence (that
reverse=True also preserves the original order of equal elements - as
the docs say, it's not that the _list_ "is reversed", is that "list
elements are sorted as if each comparison were reversed").


> Do you think that simple solution could have a chance to be added to stdlib
> somehow (with the possibility of speeding it up in the future)?

Well, the sorting "how to" already explains the basic idea.  The
`.sort()` docs also explain that stability "is helpful for sorting in
multiple passes (for example, sort by department, then by salary
grade)".

I suspect I know too much about this to be of any use in judging
what's missing ;-)

Speeding it wouldn't be easy - or usually necessary.  The obvious
"improvement" would do it all in a single `.sort()` invocation.  But
calling back into Python code to do fancy, multi-step comparisons is
so expensive that I expect it would take a large N for saving some
additional worst-case O(N*log(N)) sorting steps to repay the cost.

If you'd like to play with that, here's a different `multisort()`
implementation.  Again `specs` is a list of (key_function,
True-for-reverse) tuples, most-significant key first.  And, again, no
assumptions are made about what key functions return, and the sort
continues to guarantee that only "<" comparisons are made:

def _sorter(specs):
keyfuncs, reversers = zip(*specs)

class Wrapper(object):
def __init__(self, obj):
self.keys = tuple(f(obj) for f in keyfuncs)

def __lt__(x, y):
for a, b, r in zip(x.keys, y.keys, reversers):
if a < b:
return not r
if b < a:
return r
return False# all the keys are equal

return Wrapper

def multisort(xs, specs):
xs.sort(key=_sorter(specs))
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Tim Peters
[Paul Moore]
> My understanding is that the code does a per-check that all the
> elements of the list are the same type (float, for example). This is a
> relatively quick test (O(n) pointer comparisons). If everything *is* a
> float, then an optimised comparison routine that skips all the type
> checks and goes straight to a float comparison (single machine op).

That matches my understanding.

> Because there are more than O(n) comparisons done in a typical sort,
> this is a win.

If the types are in fact all the same, it should be a win even for
n==2 (at n < 2 no comparisons are done; at n==2 exactly 1 comparison
is done):  one pointer compare + go-straight-to-C-float-"x And because the type checks needed in rich comparison

And layers of function calls.

> are much more expensive than a pointer check, it's a *big* win.

Bingo :-)


> What I'm *not* quite clear on is why Python 3's change to reject
> comparisons between unrelated types makes this optimisation possible.

It doesn't.  It would also apply in Python 2.  I simply expect the
optimization will pay off more frequently in Python 3 code.  For
example, in Python 2 I used to create lists with objects of wildly
mixed types, and sort them merely to bring objects of the same type
next to each other.  Things "like that" don't work at all in Python 3.


> Surely you have to check either way? It's not that it's a particularly
> important question - if the optimisation works, it's not that big a
> deal what triggered the insight. It's just that I'm not sure if
> there's some other point that I've not properly understood.

Well, either your understanding is fine, or we're both confused :-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!

2016-10-12 Thread Tim Peters
[Nick Coghlan]
> It's probably more relevant that cmp() went away, since that
> simplified the comparison logic to just PyObject_RichCompareBool,
> without the custom comparison function path.

Well, the current sort is old by now, and was written for Python 2.
But it did anticipate that rich comparisons were the future, and
deliberately restricted itself to using only "<" (Py_LT) comparisons.
So, same as now, only the "<" path needed to be examined.


> It *might* have still been possible to do something like this in the
> Py2 code (since the main requirement is to do the pre-check for
> consistent types if the first object is of a known type with an
> optimised fast path),

It shouldn't really matter whether it's a known type.  For any type,
if it's known that all the objects are of that type, that type's
tp_richcompare slot can be read up once, and if non-NULL used
throughout.  That would save several levels of function call per
comparison during the sort; although that's not factor-of-3-speedup
potential, it should still be a significant win.


> but I don't know anyone that actually *likes* adding new special cases
> to already complex code and trying to figure out how to test whether
> or not they've broken anything :)

A nice thing about this one is that special cases are a one-time thing
at the start, and don't change anything in the vast bulk of the
current sorting code.  So when it breaks, it should be pretty easy to
figure out why ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Multiple level sorting in python where the order of some levels may or may not be reversed

2016-10-16 Thread Tim Peters
[Alireza Rafiei ]
> I have a list called count_list which contains tuples like below:
>
> > [('bridge', 2), ('fair', 1), ('lady', 1), ('is', 2), ('down', 4),
> > ('london', 2), ('falling', 4), ('my', 1)]
>
>
> I want to sort it based on the second parameter in descending order and the
> tuples with the same second parameter must be sorted based on their first
> parameter, in alphabetically ascending order. So the ideal output is:
>
> > [('down', 4), ('falling', 4), ('bridge', 2), ('is', 2), ('london', 2),
> > ('fair', 1), ('lady', 1), ('my', 1)]

I'd like to suggest doing something simple instead, such that

data = [('bridge', 2), ('fair', 1), ('lady', 1),
('is', 2), ('down', 4), ('london', 2),
('falling', 4), ('my', 1)]

from operator import itemgetter
multisort(data, [# primary key is 2nd element, reversed
 (itemgetter(1), True),
 # secondary key is 1st element, natural
 (itemgetter(0), False)])
import pprint
pprint.pprint(data)

prints the output you want.

It's actually very easy to do this, but the cost is that it requires
doing a full sort for _each_ field you want to sort on.  Because
Python's sort is stable, it's sufficient to merely sort on the
least-significant key first, and then sort again on each key in turn
through the most-significant.  There's another subtlety that makes
this work:

> ...
> The main problem is that reverse argument takes only a boolean and applies
> to the whole list after sorting in finished.

Luckily, that's not how `reverse` actually works.  It _first_reverses
the list, then sorts, and then reverses the list again.  The result is
that items that compare equal _retain_ their original order, where
just reversing the sorted list would invert their order.  That's why,
in your example above, after first sorting on the string component in
natural order we get (in part)

[[('down', 4), ('falling', 4), ...]

and then reverse-sorting on the integer portion _leaves_ those tuples
in the same order.  That's essential for this decomposition of the
problem to work.

With that background, here's the implementation:

def multisort(xs, specs):
for key, reverse in reversed(specs):
xs.sort(key=key, reverse=reverse)

That's all it takes!  And it accepts any number of items in `specs`.
Before you worry that it's "too slow", time it on real test data.
`.sort()` is pretty zippy, and this simple approach allows using
simple key functions.  More importantly, it's much easier on your
brain ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-09 Thread Tim Peters
[Tim Delaney ]
> Isn't there already always a scan of the iterable to build the keys array
> for sorting (even if no key keyword param is specified)?

No - `.sort()` is a list method, and as such has nothing to do with
arbitrary iterables, just lists (perhaps you're thinking of the
`sorted()` function?).  If no `key=` argument is passed, the list guts
itself is used (as-is) as the vector of keys.


> In which case adding the homogenity check there seems like it shouldn't
> add much overhead at all (I have to say that I was surprised with 10+%
> reductions in speed in some of the heterogenous TimSort tests for this 
> reason).

Those are worst cases where the current sort does very few compares
(like just N-1 for a list of length N).  Because they do "amazingly"
few compares, they're already "amazingly" fast.  And they also do
little data movement (e.g., none at all for /sort & =sort, and N//2
pointer swaps for \sort).  Because of that any new O(N) overhead would
make them significantly slower - unless the new overhead pays off by
allowing a larger time saving than it costs.(as it does when the list
is same-type).

There is a "natural" place to insert "same type?" checks:  the outer
loop of the sort marches over the vector once, left to right,
alternately identifying the next natural run, then possibly extending
it and/or merging it into previous runs.  The checks could be put
there instead, but the code would be ugly and more invasive - I
wouldn't even bother trying it.


> And could specific richcompares be refactored so there was a "we really know
> what the types are is, no need to check" version available to sort() (with
> the typechecking version available for general use/unoptimised sorting)?

They're not _just_ type-aware.  For example, the special function for
ints is specialized to integers that happen to fit in one (internal)
"digit", and the special function for strings is specialized to those
that happen to be stored in PyUnicode_1BYTE_KIND format.  Adding such
stuff to the public C API would be ... well, for a start, tedious ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Tim Peters
[Elliot Gorokhovsky ]
> (Summary of results: my patch at https://bugs.python.org/issue28685 makes
list.sort() 30-50%
> faster in common cases, and at most 1.5% slower in the uncommon worst
case.)
> ...

Would someone please move the patch along?  I expect it's my fault it's
languished so long, since I'm probably the natural person to review it, but
I've been buried under other stuff.

But the patch doesn't change anything about the sorting algorithm itself -
even shallow knowledge of how timsort works is irrelevant.  It's just
plugging in a different bottom-level object comparison _function_ when that
appears valuable.

I've said from the start that it's obvious (to me ;-) ) that it's an
excellent tradeoff.  At worst it adds one simple (pre)pass over the list
doing C-level pointer equality comparisons.  That's cheap.  The worst-case
damage is obviously small, the best-case gain is obviously large, and the
best cases are almost certainly far more common than the worst cases in
most code.

In reviewing my own code, after it was fiddled to work under Python 3 there
are no mixed-type lists that are ever sorted.  There are lists with complex
objects, but whenever those are sorted there's a `key=` argument that
reduces the problem to sorting ints or tuples of builtin scalar types.

I don't care about anyone else's code ;-)

One subtle thing to look at:  thread safety.  IIRC, the patch plugged the
comparison function into a file global.  That's obviously hosed if multiple
threads sort different kinds of lists simultaneously.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-05 Thread Tim Peters
[Chris Angelico ]
> Arbitrary comparison functions let you do anything but whoa, I
> cannot imagine any way that this would ever happen outside of "hey
> look, here's how you can trigger a SystemError"!

CPython is full of defensive code protecting against malicious crap.
That's why it rarely crashes ;-)

def __lt__(self, other):
return self.size < other.size

Looks harmless?  Can't tell!  For all we know, there are proxy
objects, and other.__getattr__ invokes some elaborate library to open
a socket in a new thread to fetch the value of `size` over a network.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Reducing collisions in small dicts/sets

2017-06-24 Thread Tim Peters
Short course:  the average number of probes needed when searching
small dicts/sets can be reduced, in both successful ("found") and
failing ("not found") cases.

But I'm not going to pursue this.  This is a brain dump for someone
who's willing to endure the interminable pain of arguing about
benchmarks ;-)

Background:

http://bugs.python.org/issue30671

raised some questions about how dict collisions are handled.  While
the analysis there didn't make sense to me, I wrote enough code to dig
into it.  As detailed in that bug report, the current implementation
appeared to meet the theoretical performance of "uniform hashing",
meaning there was no room left for improvement.

However, that missed something:  the simple expressions for expected
probes under uniform hashing are upper bounds, and while they're
excellent approximations for modest load factors in sizable tables,
for small tables they're significantly overstated.  For example, for a
table with 5 items in 8 slots, the load factor is a = 5/8 = 0.625, and

avg probes when found = log(1/(1-a))/a = 1.57
   when not found = 1/(1-a)= 2.67

However, exact analysis gives 1.34 and 2.25 instead.  The current code
achieves the upper bounds, but not the exact values.  As a sanity
check, a painfully slow implementation of uniform hashing does achieve
the exact values.

Code for all this is attached, in a framework that allows you to
easily plug in any probe sequence strategy.  The current strategy is
implemented by generator "current".  There are also implementations of
"linear" probing, "quadratic" probing, "pre28201" probing (the current
strategy before bug 28201 repaired an error in its coding), "uniform"
probing, and ... "double".  The last is one form of "double hashing"
that gets very close to "uniform".

Its loop guts are significantly cheaper than the current scheme, just
1 addition and 1 mask.  However, it requires a non-trivial modulus to
get started, and that's expensive.

Is there a cheaper way to get close to "uniform"?  I don't know - this
was just the best I came up with so far.

Does it matter?  See above ;-)

If you want to pursue this, take these as given:

1. The first probe must be simply the last `nbits` bits of the hash
code.  The speed of the first probe is supremely important, that's the
fastest possible first probe, and it guarantees no collisions at all
for a dict indexed by a contiguous range of integers (an important use
case).

2. The probe sequence must (at least eventually) depend on every bit
in the hash code.  Else it's waaay too easy to stumble into
quadratic-time behavior for "bad" sets of keys, even by accident.

Have fun :-)
MIN_ELTS = 100_000

M64 = (1 << 64) - 1
def phash(obj, M=M64):  # hash(obj) as uint64
return hash(obj) & M

# Probers:  generate sequence of table indices to look at,
# in table of size 2**nbits, for object with uint64 hash code h.

def linear(h, nbits):
mask = (1 << nbits) - 1
i = h & mask
while True:
yield i
i = (i + 1) & mask

# offsets of 0, 1, 3, 6, 10, 15, ...
# this permutes the index range when the table size is a power of 2
def quadratic(h, nbits):
mask = (1 << nbits) - 1
i = h & mask
inc = 1
while True:
yield i
i = (i + inc) & mask
inc += 1

def pre28201(h, nbits):
mask = (1 << nbits) - 1
i = h & mask
while True:
yield i
i = (5*i + h + 1) & mask
h >>= 5

def current(h, nbits):
mask = (1 << nbits) - 1
i = h & mask
while True:
yield i
h >>= 5
i = (5*i + h + 1) & mask

# One version of "double hashing".  The increment between probes is
# fixed, but varies across objects.  This does very well!  Note that the
# increment needs to be relatively prime to the table size so that all
# possible indices are generated.  Because our tables have power-of-2
# sizes, we merely need to ensure the increment is odd.
# Using `h % mask` is akin to "casting out 9's" in decimal:  it's as if
# we broke the hash code into nbits-wide chunks from the right, then
# added them, then repeated that procedure until only one "digit"
# remains. All bits in the hash code affect the result.
# While mod is expensive, a successful search usual gets out on the
# first try, & then the lookup can return before the mod completes.
def double(h, nbits):
mask = (1 << nbits) - 1
i = h & mask
yield i
inc = (h % mask) | 1# force it odd
while True:
i = (i + inc) & mask
yield i

# The theoretical "gold standard":  generate a random permutation of the
# table indices for each object.  We can't actually do that, but
# Python's PRNG gets close enough that there's no practical difference.
def uniform(h, nbits):
from random import seed, randrange
seed(h)
n = 1 << nbits
seen = set()
while True:
assert len(seen) < n
while True:
i = randrange(n)
if i not in seen:
break
 

Re: [Python-ideas] Reducing collisions in small dicts/sets

2017-06-26 Thread Tim Peters
[MRAB ]
> If the current scheme suffers only for small tables, couldn't you use an
> alternative scheme only for small tables?

Sure.  But whether that's desirable partly depends on timing actual C
code.  Try it ;-)  For maintenance sanity, it's obviously better to
have only one scheme to wrestle with.

Note that "may visit a slot more than once" isn't the only thing in
play, just one of the seemingly important things.  For example, the
current scheme requires 3 adds, 2 shifts, and a mask on each loop
iteration (or 1 add and 1 shift of those could be replaced by 1
multiplication).  The double-hashing schemes only require 1 add and 1
mask per iteration.

In cases of collision, that difference is probably swamped by waiting
for cache misses.  But, as I said in the first msg:

"""
This is a brain dump for someone who's willing to endure the
interminable pain of arguing about
benchmarks ;-)
"""
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Reducing collisions in small dicts/sets

2017-06-26 Thread Tim Peters
[Tim]
>...  I haven't yet thought of a cheap way to compute an
> `inc` for double-hashing that isn't vulnerable to bad behavior for
> _some_ easily constructed set of int keys.  If you forget "cheap",
> it's easy; e.g.,
>
> random.seed(h)
> inc = random.choice(range(1, mask + 1, 2))

Heh.  I always tell people not to believe what they think:  run code
to make sure!  So I did ;-)

def duni(h, nbits):
from random import seed, choice
mask = (1 << nbits) - 1
i = h & mask
yield i
seed(h)
inc = choice(range(1, mask + 1, 2))
while True:
i = (i + inc) & mask
yield i

On the terrible-for-good-reasons-for-failing-`double`-1023*i-searches
case, it turned out `duni` did just as bad as `dfib`:

"""
bits 20 nslots 1,048,576 dlen 699,050 alpha 0.67 # built 1
theoretical avg probes for uniform hashing when found 1.65 not found 3.00
...
 crisp ... skipping (slow!)
prober current
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 34 mean 3.04
prober double
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 699049 mean 1867.51
prober dfib
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 427625 mean 8.09
prober duni
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 433846 mean 9.84
prober uniform
found min 1:66.65% max 24 mean 1.65
fail  min 1:33.35% max 35 mean 3.00

Yikes!  That really surprised me.  So - are integers of the form
1023*i so magical they also manage to provoke random.seed() into
amazingly regular behavior?  Or is there something about the whole
idea of "double hashing" that leaves it vulnerable no matter how
"random" an odd increment it picks?

It's not the former.  More code showed that a great number of distinct
`inc` values are in fact used.

And double hashing is widely & successfully used in other contexts. so
it's not a problem with the idea _in general_.

What does appear to be the case:  it doesn't always play well with
taking the last `nbits` bits as the initial table index.  In other
double-hashing contexts, they strive to pick a pseudo-random initial
table index too.

Why this matters:  for any odd integer N,

N*i = N*j (mod 2**k)

if and only if

i = j (mod 2**k)

So, when integers are their own hash codes, for any `i` all the values in

range(i*N, i*N + N*2**k, N)

hash to unique slots in a table with 2**k slots (actually any table
with 2**m slots for any m >= k).  In particular, mod 2**k they map to
the slots

i*N + 0*N
i*N + 1*N
i*N + 2*N
i*N + 3*N
   ...

So, on a failing search for an integer of the form j*N, whenever
double-hashing picks an increment that happens to be a multiple of N,
it can bump into a huge chain of collisions, because double-hashing
also uses a fixed increment between probes.  And this is so for any
odd N.  For example, using a "yield i * 11" generator:

"""
bits 20 nslots 1,048,576 dlen 699,050 alpha 0.67 # built 1
theoretical avg probes for uniform hashing when found 1.65 not found 3.00
...
prober current
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 34 mean 3.00
prober double
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 667274 mean 34.10
prober dfib
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 539865 mean 9.30
prober duni
found min 1:100.00% max 1 mean 1.00
fail  min 1:33.33% max 418562 mean 8.65
prober uniform
found min 1:66.63% max 25 mean 1.65
fail  min 1:33.31% max 32 mean 3.00
"""

All three double-hashing variants have horrible max collision chains,
for the reasons just explained.  In "normal" double-hashing contexts,
the initial table indices are scrambled; it's my fault they follow a
(mod 2**k) arithmetic progression in Python.

Which fault I gladly accept ;-)  It's valuable in practice.

But, so long as that stays, it kills the double-hashing idea for me:
while it's great for random-ish hash codes, the worst cases are
horribly bad and very easily provoked (even by accident).
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Reducing collisions in small dicts/sets

2017-06-25 Thread Tim Peters
Two bits of new info:  first, it's possible to get the performance of
"double" without division, at least via this way:

"""
# Double hashing using Fibonacci multiplication for the increment.  This
# does about as well as `double`, but doesn't require division.
#
# The multiplicative constant depends on the word size W, and is the
# nearest odd integer to 2**W/((1 + sqrt(5))/2).  So for a 64-bit box:
#
# >>> 2**64 / ((1 + decimal.getcontext().sqrt(5))/2)
# Decimal('11400714819323198485.95161059')
#
# For a 32-bit box, it's 2654435769.
#
# The result is the high-order `nbits` bits of the low-order W bits of
# the product.  In C, the "& M" part isn't needed (unsigned * in C
# returns only the low-order bits to begin with).
#
# Annoyance:  I don't think Python dicts store `nbits`, just 2**nbits.
def dfib(h, nbits, M=M64):
mask = (1 << nbits) - 1
i = h & mask
yield i
inc = (((h * 11400714819323198485) & M) >> (64 - nbits)) | 1
while True:
i = (i + inc) & mask
yield i
"""

Second, the program I posted uses string objects as test cases.  The
current string hash acts like a good-quality pseudo-random number
generator:  change a character in the string, and the hash typically
changes "a lot".

It's also important to look at hashes with common patterns, because,
e.g., all "sufficiently small" integers are their _own_ hash codes
(e.g., hash(3) == 3).  Indeed, long ago Python changed its dict
implementation to avoid quadratic-time behavior for unfortunate sets
of real-life integer keys.

Example:  change `objgen()` to `yield i << 12` instead of `yield
str(i)`.  Then we generate integers all of whose final 12 bits are
zeroes.  For all sufficiently small tables, then, these ints _all_ map
to the same initial table slot (since the initial probe is taken from
the last `nbits` bits).  The collision resolution scheme is needed to
prevent disaster.

Here's a chunk of output from that, for dicts of size 2,730:

"""
bits 12 nslots 4,096 dlen 2,730 alpha 0.67 # built 37
theoretical avg probes for uniform hashing when found 1.65 not found 3.00
 crisp when found 1.65 not found 3.00
prober linear
found min 1:0.04% max 2730 mean 1365.50
fail  min 2731:100.00% max 2731 mean 2731.00
prober quadratic
found min 1:0.04% max 2730 mean 1365.50
fail  min 2731:100.00% max 2731 mean 2731.00
prober pre28201
found min 1:0.04% max 61 mean 6.17
fail  min 5:28.94% max 68 mean 9.78
prober current
found min 1:0.04% max 58 mean 5.17
fail  min 4:29.30% max 70 mean 8.62
prober double
found min 1:0.04% max 5 mean 2.73
fail  min 2:41.47% max 9 mean 3.94
prober dfib
found min 1:0.04% max 9 mean 2.53
fail  min 2:10.87% max 17 mean 4.48
prober uniform
found min 1:66.52% max 21 mean 1.65
fail  min 1:33.41% max 30 mean 2.99
"""

It's a worst-case disaster for linear and quadratic probing.  The
`current` scheme does fine, but `double` and `dfib` do significantly
better on all measures shown.  `uniform` is oblivious, still achieving
its theoretical average-case performance.

That last isn't surprising "in theory", since it theoretically picks a
probe sequence uniformly at random from the set of all table slot
permutations.  It's perhaps a bit surprising that this approximate
_implementation_ also achieves it.  But `random.seed(h)` does a
relatively enormous amount of work to spray the bits of `h` all over
the Twister's internal state, and then shuffle them around.

In any case, the "double hashing" variants ("double" and "dfib") look
very promising, doing better than the current scheme for both small
tables and pathological key sets.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Hexadecimal floating literals

2017-09-21 Thread Tim Peters
[David Mertz <me...@gnosis.cx>]
> -1
>
> Writing a floating point literal requires A LOT more knowledge than writing
> a hex integer.

But not really more than writing a decimal float literal in
"scientific notation".  People who use floats are used to the latter.
Besides using "p" instead of "e" to mark the exponent, the only
differences are that the mantissa is expressed in hex instead of in
decimal, and the implicit base to which the exponent is applied is 2
instead of 10.

Either way it's a notation for a rational number, which may or may not
be exactly representable in the native HW float format.


> What is the bit length of floats on your specific Python compile? What
> happens if you specify more or less precision than actually available.
> Where is the underflow to subnormal numbers?

All the same answers apply as when using decimal "scientific
notation".  When the denoted rational isn't exactly representable,
then a maze of rounding, overflow, and/or underflow rules apply.  The
base the literal is expressed in doesn't really have anything to do
with those.


> What is the bit representation of [infinity]? Nan?

Hex float literals in general have no way to spell those:  it's just a
way to spell a subset of (mathematical) rational numbers.  Python,
however, does support special cases for those:

>>> float.fromhex("inf")
inf
>>> float.fromhex("nan")
nan


> -0 vs +0?

The obvious first attempts work fine for those ;-)


> There are people who know this and need to know this. But float.fromhex() is
> already available to them. A literal is an attractive nuisance for people
> who almost-but-not-quite understand IEEE-854.

As notations for rationals, nobody needs to understand 854 at all to
use these things, so long as they stick to exactly representable
numbers.  Whether a specific literal _is_ exactly representable, and
what happens if it's not, does require understanding a whole lot - but
that's also true of decimal float literals.

> I.e. those people who named neither Tim Peters nor Mike Cowlishaw.

Or Mark Dickinson ;-)

All that said, I'm -0 on the idea.  I doubt I (or Mark, or Mike) would
use it, because the need is rare and float.fromhex() is already
sufficient.  Indeed, `fromhex()` is less annoying, because it does
support special cases for infinities and NaNs, and doesn't _require_ a
"0x" prefix.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Hexadecimal floating literals

2017-09-22 Thread Tim Peters
[Antoine Pitrou ]
> ...
> The main difference is familiarity.  "scientific" notation should be
> well-known and understood even by high school kids.  Who knows about
> hexadecimal notation for floats, apart from floating-point experts?

Here's an example:  you <0x0.2p0 wink>.  For people who understand
both hex and (decimal) scientific notation, learning what hex float
notation means is easy.


> So for someone reading code, the scientific notation poses no problem
> as they understand it intuitively (even if they may not grasp the
> difficulties of the underlying conversion to binary FP), while for
> hexadecimal float notation need they have to go out of their way to
> learn about it, parse the number slowly and try to make out what its
> value is.

I've seen plenty of people on StackOverflow who (a) don't understand
hex notation for integers; and/or (b) don't understand scientific
notation for floats.  Nothing is self-evident about either; they both
have to be learned at first.  Same for hex float notation.  Of course
it's true that many (not all) people do know about hex integers and/or
decimal scientific notation from prior (to Python) experience.

My objection is that we already have a way to use hex float notation,
and the _need_ for it is rare. If someone uninitiated sees a rare:

x = 0x1.aaap-4

they're going to ask on StackOverflow what the heck it's supposed to
mean.  But if they see a rare:

x = float.fromhex("0x1.aaap-4")

they can Google for "python fromhex" and find the docs themselves at
once.  The odd method name makes it highly "discoverable", and I think
that's a feature for rare gimmicks with a small, specialized audience.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Pattern Matching Syntax

2018-05-04 Thread Tim Peters
[Guido]
> Can I recommend going slow here? This is a very interesting topic where many
> languages have gone before. I liked Daniel F Moisset's analysis about the
> choices of a language designer and his conclusion that match should be a
> statement.

Just to be annoying ;-) , I liked the way he _reached_ that
conclusion:  by looking at real-life Python code that may have been
written instead to use constructs "like this".  I find such
examination far more persuasive than abstract arguments or made-up
examples.

An observation:  syntax borrowed from functional languages often fails
to work well in practice when grafted onto a language that's
statement-oriented - it only works well for the expression subset of
the language. and even then just for when that subset is being used in
a functional way (e.g., the expression `object.method(arg)` is usually
used for its side effects, not for its typically-None return value).
OTOH, syntax borrowed from a statement-oriented language usually fails
to work at all when grafted onto an "almost everything's an
expression" language.

So that's an abstract argument of my own,  but - according to me -
should be given almost no weight unless confirmed by examining
realistic code.  Daniel did some of both - great!


> ...
> A larger topic may be how to reach decisions. If I've learned one thing from
> PEP 572 it's that we need to adjust how we discuss and evaluate proposals.
> I'll think about this and start a discussion at the Language Summit about
> this.

Python needs something akin to a dictator, who tells people how things
are going to be, like it or not.  But a benevolent dictator, not an
evil one.  And to prevent palace intrigue, they should hold that
position for life.  Just thinking outside the box there ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Pattern Matching Syntax

2018-05-05 Thread Tim Peters
[Tim]
>> ... I liked the way he _reached_ that conclusion:  by looking at real-
>> life Python code that may have been written instead to use constructs
>> "like this".  I find such examination far more persuasive than abstract
>> arguments or made-up examples.

[Serhiy Storchaka ]
> I would like to see such examination for PEP 572. And for all other syntax
> changing ideas.

I did it myself for 572, and posted several times about what I found.
It was far more productive to me than arguing (and, indeed, I sat out
of the first several hundred msgs on python-ideas entirely because I
spent all my time looking at code instead).

Short course:  I found a small win frequently, a large win rarely, but
in most cases wouldn't use it.  In all I expect I'd use it
significantly more often than ternary "if", but far less often than
augmented assignment.  But that's me - everybody needs to look at
their own code to apply _their_ judgment.

572 is harder than a case/switch statement to consider this way,
because virtually every assignment statement binding a name could
_potentially_ be changed to a binding expression instead, and there
are gazillions of those.  For considering case/switch additions, you
can automate searches to vastly whittle down the universe of places to
look at (`elif` chains, and certain nested if/else if/else if/else ...
patterns).


> I withdrew some my ideas and patches when my examinations showed that the
> number of cases in the stdlib that will take a benefit from rewriting using
> a new feature or from applying a compiler optimization is not large enough.

Good!  I approve :-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-04 Thread Tim Peters
[Nick Coghlan ]
> ...
> The essence of the given clause concept would be to modify *these specific
> cases* (at least initially) to allow the condition expression to be followed
> by an inline assignment, of the form "given TARGET = EXPR".

I'm not clear on what "these specific cases" are, specifically ;-)
Conditions in "if", "elif", and "while" statement expressions?
Restricted to one "given" clause, or can they chain?  In a listcomp,
is it one "given" clause per "if", or after at most one "if"?  Or is
an "if" even needed at all in a listcomp?  For example,

[(f(x)**2, f(x)**3) for x in xs]

has no conditions, and

[(fx := f(x))**2, fx**3) for x in xs]

is one reasonable use for binding expressions.

[(fx**2, fx**3) for x in xs given fx = f(x)]

reads better, although it's initially surprising (to my eyes) to find
fx defined "at the end".  But no more surprising than the current:

[(fx**2, fx**3) for x in xs for fx in [f(x)]]

trick.


>
> While the leading keyword would allow TARGET to be an arbitrary assignment
> target without much chance for confusion, it could also be restricted to
> simple names instead (as has been done for PEP 572.

The problem with complex targets in general assignment expressions is
that, despite trying, I found no plausible use case for (at least)
unpacking syntax.  As in, e.g.,

x, y = func_returning_twople()
if x**2 + y**2 > 9:  # i.e., distance > 3, but save expensive sqrt

The names can be _unpacked_ in a general assignment expression, but
there appears to be no sane way then to _use_ the names in the test.
This may be as good as it gets:

if [(x, y := func_returning_twople()). x**2 + y**2 > 9][-1]:

That reminds me of the hideous

(condition and [T] or [F])[0]

idiom I "invented" long ago to get the effect (in all cases) of the current

T if condition else F

That was intended to be goofy fun at the time, but I was appalled to
see people later use it ;-)

It''s certain sanest as

if x**2 + y**2 > 9 given x, y = func_returning_twople():

"given" really shines there!


> With that spelling, the three examples above would become:
>
> # Exactly one branch is executed here
> if m given m = pattern.search(data):
> ...
> elif m given m = other_pattern.search(data)):
> ...
> else:

Which is OK.  The one-letter variable name obscures that it doesn't
actually reduce _redundancy_, though.  That is, in the current

match = pattern.search(data)
if match:

it's obviously less redundant typing as:

if match := pattern.search(data):

In

if match given match = pattern.search(data):

the annoying visual redundancy (& typing) persists.


> # This name is rebound on each trip around the loop
> while m given m = pattern.search(remaining_data):

Also fine, but also doesn't reduce redundancy.


> # "f(x)" is only evaluated once on each iteration
> result = [(x, y, x/y) for x in data if y given y = f(x)]

As above, the potential usefulness of "given" in a listcomp doesn't
really depend on having a conditional.  Or on having a listcomp
either, for that matter ;-)

r2, r3 = fx**2, fx**3 given fx = f(x)


One more, a lovely (to my eyes) binding expression simplification
requiring two bindings in an `if` test, taken from real-life code I
happened to write during the PEP discussion:

diff = x - x_base
if diff:
g = gcd(diff, n)
if g > 1:
return g

collapsed to the crisp & clear:

if (diff := x - x_base) and (g := gcd(diff, n)) > 1:
return g

If only one trailing "given" clause can be given per `if` test
expression, presumably I couldn't do that without trickery.  If it's
more general,

if (diff given diff = x _ xbase) and g > 1 given g = gcd(diff, n):

reads worse to my eyes (perhaps because of the "visual redundancy"
thing again), while

   if diff and g > 1 given diff = x - x_base given g = gcd(diff, n):

has my eyes darting all over the place, and wondering which of the
trailing `given` clauses executes first.

> ...
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-06 Thread Tim Peters
[Tim]
>> There's a difference, though:  if `y` "leaks", BFD.  Who cares? ;-)
>> If `y` remains inaccessible, there's no way around that.

[Chris]
> That's Steve D'Aprano's view - why not just let them ALL leak? I don't
> like it though.

I didn't suggest that.  I'm not suggesting changing _any_ existing
behavior (quite the contrary).  Since ":=" would be brand new, there
is no existing behavior for it.

> ...
> Sorry, I meant "local to the comprehension's scope". We can't know the
> user's intention. We have to create semantics before the user's
> intention even exists.

Exactly.  That's why I would like ":-=" to be defined from the start
in a way that does least damage ;-)


>> But the point above remains:  if they don't leak, contexts that want
>> them to leak have no recourse.  If they do leak, then the other uses
>> would still work fine, but they'd possibly be annoyed by a leak they
>> didn't want.

> Then let's revert the Py3 change that put comprehensions into
> functions, and put them back to the vanilla transformation:

Again, I'm not suggesting changing any existing behavior.


> stuff = [x + 1 for x in iter if x % 3]
>
> stuff = []
> for x in iter:
> if x % 3:
> stuff.append(x + 1)
>
> Now 'x' leaks as well, and it's more consistent with how people
> explain comprehensions. Is that a good thing? I don't think so. Having
> the iteration variable NOT leak means it's a self-contained unit that
> simply says "that thing we're iterating over".

It's fine by me that for-target names don't leak.  I didn't suggest
changing that.


>> Part of that is because - as the existence of this thread attests to -
>> we can't even control all the scopes gimmicks Python already has.  So
>> people are understandably terrified of adding even more ;-)

> Part of it is just that people seem to be fighting for the sake of
> fighting. I'm weary of it, and I'm not going to debate this point with
> you. You want 'em to leak? No problem. Implement it that way and I'm
> not going to argue it.

I'm more interested in real-life use cases than in arguments.  My
suggestion came from staring at my real-life use cases, where binding
expressions in comprehensions would clearly be more useful if the
names bound leaked.  Nearly (but not all) of the time,, they're quite
happy with that for-target names don't leak.  Those are matters of
observation rather than of argument.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-06 Thread Tim Peters
[Tim]
>> I have a long history of arguing that magically created lexically
>> nested anonymous functions try too hard to behave exactly like
>> explicitly typed lexically nested functions, but that's the trendy
>> thing to do so I always lose ;-)

[Nick Coghlan ]
> You have the reasoning there backwards:

That's easy to believe - I also had a long history of resisting nested
scopes at all ;-)


> implicitly nested scopes behave like explicitly nested scopes because
> that was the *easy* way for me to implement them in Python 3.0 (since
> I got to re-use all the pre-existing compile time and runtime machinery
> that was built to handle explicit lexical scopes).
> Everything else I tried (including any suggestions made by others on the
> py3k mailing list when I discussed the problems I was encountering) ran into
> weird corner cases at either compile time or run time, so I eventually gave
> up and proposed that the implicit scope using to hide the iteration variable
> name binding be a full nested closure, and we'd just live with the
> consequences of that.

It's unfortunate that there are "consequences" at all.  That kind of
thing is done all the time in Lisp-ish languages, but they require
explicitly declaring names' scopes.  Python's "infer scope instead by
looking for bindings" worked great when it had 3 scopes total, but
keeps forcing "consequences" that may or may not be desired in a
generally-nested-scopes world.


> The sublocal scoping proposal in the earlier drafts of PEP 572 was our first
> serious attempt at defining a different way of doing things that would allow
> names to be hidden from surrounding code while still being visible in nested
> suites, and it broke people's brains to the point where Guido explicitly
> asked Chris to take it out of the PEP :)

To which removal I was sympathetic, BTW.


> However, something I *have* been wondering is whether or not it might make
> sense to allow inline scoping declarations in comprehension name bindings.
> Then your example could be written:
>
> def ...:
> p = None
> while any(n % p for nonlocal p in small_primes):
> # p was declared as nonlocal in the nested scope, so our p
> points to the last bound value

Which more directly addresses the underlying problem:  not really
"binding expressions" per se, but the lack of control over scope
decisions in comprehensions period.  It's not at all that nested
scopes are a poor model, it's that we have no control over what's
local _to_ nested scopes the language creates.  I'd say that's worth
addressing in its own right, regardless of PEP 572's fate.

BTW, the "p = None" there is annoying too ;-)


> Needing to switch from "nonlocal p" to "global p" at module level would
> likely be slightly annoying, but also a reminder that the bound name is now
> visible as a module attribute.

Or `nonlocal` could be taught that its use one level below `global`
has an obvious meaning:  global.


> If any other form of comprehension level name binding does eventually get
> accepted, then inline scope declarations could similarly be used to hoist
> values out into the surrounding scope:
>
> rem = None
> while any((nonlocal rem := n % p) for nonlocal p in small_primes):
> # p and rem were declared as nonlocal in the nested scope, so
> our rem and p point to the last bound value

Right - as above, inline scope declarations would be applicable to all
forms of comprehension-generated code.  And to any other future
construct that creates lexically nested functions.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-06 Thread Tim Peters
[Chris Angelico ]
> ...
> You're correct. The genexp is approximately equivalent to:
>
> def genexp():
> for p in small_primes:
> thisp = p
> yield n % thisp == 0
> while any(genexp()):
> n //= thisp
>
> With generator expressions, since they won't necessarily be iterated
> over immediately, I think it's correct to create an actual nested
> function; you need the effects of closures. With comprehensions, it's
> less obvious, and what you're asking for might be more plausible. The
> question is, how important is the parallel between list(x for x in
> iter) and [x for x in iter] ? Guido likes it, and to create that
> parallel, list comps MUST be in their own functions too.

I don't care how they're implemented here; I only care here about the
visible semantics.


>> I have a long history of arguing that magically created lexically
>> nested anonymous functions try too hard to behave exactly like
>> explicitly typed lexically nested functions, but that's the trendy
>> thing to do so I always lose ;-)  The problem:  in a magically created
>> nested function, you have no possibility to say _anything_ about
>> scope; at least when you type it by hand, you can add `global` and/or
>> `nonlocal` declarations to more-or-less say what you want.

> That's a fair point. But there is another equally valid use-case for
> assignment expressions inside list comps:
>
> values = [y + 2 for x in iter if (y := f(x)) > 0]
>
> In this case, it's just as obvious that the name 'y' should be local
> to the comprehension, as 'x' is.

There's a difference, though:  if `y` "leaks", BFD.  Who cares? ;-)
If `y` remains inaccessible, there's no way around that.


> Since there's no way to declare "nonlocal y" inside the
> comprehension, you're left with a small handful of options:

i leapt straight to #3:


> 1) All names inside list comprehensions are common with their
> surrounding scope. The comprehension isn't inside a function, the
> iteration variable leaks, you can share names easily. Or if it *is*
> inside a function, all its names are implicitly "nonlocal" (in which
> case there's not much point having the function).

DOA.  Breaks old code.

> 2) All names are local to their own scope. No names leak, and that
> includes names made with ":=".

Saying "local to their own scope" _assumes_ what you're trying to
argue _for_ - it's circular.  In fact it's impossible to know what the
user intends the scope to be.


> 3) Some sort of rule like "iteration variables don't leak, but those
> used with := are implicitly nonlocal".

Explicitly, because "LHS inherits scope from its context" (whether
global, nonlocal, or local)  is part of what ":=" is defined to _mean_
then.


> Would create odd edge cases eg [x for x in iter if x := x] and that would
> probably result in x leaking.

Don't care.


> 4) A special adornment on local names if you don't want them to leak
>
> 5) A special adornment on local names if you DO want them to leak

Probably also DOA.


> 6) A combination of #3 and #4: "x := expr" will be nonlocal, ".x :=
> expr" will be local, "for x in iter" will be local. Backward
> compatible but a pain to explain.

Definitely DOA.

...

>> Since there's no way to explicitly identify the desired scope, I
>> suggest that ":=" inside magically created nested functions do the
>> more-useful-more-often thing:  treat the name being bound as if the
>> binding had been spelled in its enclosing context instead.  So, in the
>> above, if `thisp` was declared `global`, also `global` in the genexp;
>> if `nonlocal`, also `nonlocal`; else (almost always the case in real
>> life) local to the containing code (meaning it would be local to the
>> containing code, but nonlocal in the generated function).

> Is it really more useful more often?

I found no comprehensions of any kind in my code where binding
expressions would actually be of use unless the name "leaked".  Other
code bases may, of course, yield different guesses.  I'm not a "cram a
lot of stuff on each line" kind of coder.

But the point above remains:  if they don't leak, contexts that want
them to leak have no recourse.  If they do leak, then the other uses
would still work fine, but they'd possibly be annoyed by a leak they
didn't want.


>> No, I didn't have much use for `for` target names becoming magically
>> local to invisible nested functions either, but I appreciate that it's
>> less surprising overall.  Using ":=" is much more strongly screaming
>> "I'm going way out of my way to give a name to this thing, so please
>> don't fight me by assuming I need to be protected from the
>> consequences of what I explicitly asked for".

> Personally, I'd still like to go back to := creating a statement-local
> name, one that won't leak out of ANY statement. But the tide was
> against that one, so I gave up on it.

Part of that is because - as the existence of this thread attests to -
we can't even control all the scopes gimmicks Python already has. 

Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-06 Thread Tim Peters
[Tim]
>> In a different thread I noted that I sometimes want to write code like
>> this:
>> ...
>>while any(n % (thisp := p) == 0 for p in small_primes):
>>n //= thisp
>> ...

[Ryan Gonzalez ]
> Couldn't you just do:
>
> def first(it):
>return next(it, None)
>
> while (item := first(p for p in small_primes if n % p == 0)):
># ...

In the "different thread" I mentioned above, I already noted that kind
of spelling.

I'm not at a loss to think of many ways to spell it ;-)  The point of
this thread was intended to be about the semantics of binding
expressions in comprehensions.  For that purpose, the PEP noting that

total = 0
progressive_sums = [total := total + value for value in data]

fails too is equally relevant.  Of course there are many possible ways
to rewrite that too that would work.  That doesn't change that the
failing attempts "look like they should work", but don't, but could if
the semantics of ":=" were defined differently inside
magically-created anonymous lexically nested functions
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] A comprehension scope issue in PEP 572

2018-05-06 Thread Tim Peters
In a different thread I noted that I sometimes want to write code like this:

while any(n % p == 0 for p in small_primes):
# divide p out - but what's p?

But generator expressions hide the value of `p` that succeeded, so I
can't.  `any()` and `all()` can't address this themselves - they
merely work with an iterable of objects to evaluate for truthiness,
and know nothing about how they're computed.  If you want to identify
a witness (for `any()` succeeding) or a counterexample (for `all()`
failing), you need to write a workalike loop by hand.

So would this spelling work using binding expressions?

while any(n % (thisp := p) == 0 for p in small_primes):
n //= thisp

I'm not entirely clear from what the PEP says, but best guess is "no",
from this part of the discussion[1]:

"""
It would be convenient to use this feature to create rolling or
self-effecting data streams:

progressive_sums = [total := total + value for value in data]

This will fail with UnboundLocalError due to total not being
initalized. Simply initializing it outside of the comprehension is
insufficient - unless the comprehension is in class scope: ...
"""

So, in my example above, I expect that `thisp` is viewed as being
local to the created-by-magic lexically nested function implementing
the generator expression.  `thisp` would be bound on each iteration,
but would vanish when `any()` finished and the anonymous function
vanished with it.  I'd get a NameError on "n //= thisp" (or pick up
whatever object it was bound to before the loop).

I have a long history of arguing that magically created lexically
nested anonymous functions try too hard to behave exactly like
explicitly typed lexically nested functions, but that's the trendy
thing to do so I always lose ;-)  The problem:  in a magically created
nested function, you have no possibility to say _anything_ about
scope; at least when you type it by hand, you can add `global` and/or
`nonlocal` declarations to more-or-less say what you want.

Since there's no way to explicitly identify the desired scope, I
suggest that ":=" inside magically created nested functions do the
more-useful-more-often thing:  treat the name being bound as if the
binding had been spelled in its enclosing context instead.  So, in the
above, if `thisp` was declared `global`, also `global` in the genexp;
if `nonlocal`, also `nonlocal`; else (almost always the case in real
life) local to the containing code (meaning it would be local to the
containing code, but nonlocal in the generated function).

Then my example would work fine, and the PEP's would too just by adding

total = 0

before it.

Funny:  before `nonlocal` was added, one of the (many) alternative
suggestions was that binding a name in an enclosing scope use ":="
instead of "=".

No, I didn't have much use for `for` target names becoming magically
local to invisible nested functions either, but I appreciate that it's
less surprising overall.  Using ":=" is much more strongly screaming
"I'm going way out of my way to give a name to this thing, so please
don't fight me by assuming I need to be protected from the
consequences of what I explicitly asked for".


[1] https://www.python.org/dev/peps/pep-0572/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-04 Thread Tim Peters
[Tim]
>> ...
>> It's no longer the case that Python avoided that entirely, since
>> "async def", "async for", and "async with" statements were added
>> _without_ making "async" a new reserved word.  It may require pain in
>> the parser, but it's often doable anyway.  At this stage in Python's
>> life, adding new _reserved_ words "should be" an extremely high bar -
>> but adding new non-reserved keywords (like "async") should be a much
>> lower bar.

[Guido]
> Do note that this was a temporary solution. In 3.5 we introduced this hack.
> In 3.6, other uses of `async` and `await` became deprecated (though you'd
> have to use `python -Wall` to get a warning). In 3.7, it's a syntax error.

See my "that deserves more thought" at the start, but wrt future cases
then ;-)  In 3.5 and 3.6, "everything just works" for everyone.  In
3.7 the implementation gets churned again, to go out of its way to
break the handful of code using "async" as an identifier.  It's
obvious who that hurts, but who does that really benefit?

My experience with Fortran convinces me nobody would _actually_ be
confused even if they wrote code like:

async def frobnicate(async=True):
if async:
async with ...

But nobody would actually do that.  Then again, "but people _could_ do
that!" barely registers with me because the nobody-actually-does-it
theoretical possibilities were so much worse in Fortran, so I tend to
tune that kind of argument out reflexively.  For example, whitespace
was also irrelevant in Fortran, and these two statements mean
radically different things:

D O1 0I=1 00,30 0
D O1 0I=1 00.30 0

The first is like:

for I in range(100, 301):
# the block ends at the next statement with label 10

The seconds is like:

DO10I = 100.300

All actual Fortran code spells them like this instead:

DO 10 I = 100, 300
DO10I = 100.300

The differing intents are obvious at a glance then - although, yup, to
the compiler the difference is solely due to that one uses a comma
where the other a period.  I'm not suggesting Python go anywhere near
_that_ far ;-)  Just as far as considering that there's no actual harm
in Fortran allowing "DO" to be a variable name too.  Nobody is even
tempted to think that "DO" might mean "DO loop" in, e.g.,

DO = 4
X = FUNC(DO)
X = DO(Y)
IF (DO.OR.DONTDO) GOTO 10

etc.  People generally _don't_ use Fortran keywords as identifiers
despite that they can, but it's a real boon for the relatively rare
older code that failed to anticipate keywords added after it was
written.

> ...
> I'd also say that the difficulty of Googling for the meaning of ":="
> shouldn't be exaggerated. Currently you can search for "python operators"
> and get tons of sites that list all operators.

I've noted before that people don't seem to have trouble finding the
meaning of Python's "is", "and", and "or" either.  But Googling for
"is" (etc) on its own isn't the way to do it ;-)


> I also note that Google seems to be getting smarter about non-alphabetic
> searches -- I just searched for "python << operator" and the first hit was
> https://wiki.python.org/moin/BitwiseOperators

Ya - most arguments are crap ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-13 Thread Tim Peters
[Steven D'Aprano ]
>>> ...
>>> average = 0
>>> smooth_signal = [(average := (1-decay)*average + decay*x) for x in signal]
>>> assert average == smooth_signal[-1]

[Tim]
>> The scope issues are logically independent of assignment-expression
>> spelling, but it's a pretty safe guess Nick is opposed to that example
>> ever "just working" regardless of spelling, while PEP 572 doesn't
>> currently support it anyway.


[Nick Coghlan ]
> I'm personally fine with that example working

"Just working" meant "exactly as written".  "Regardless of spelling"
meant whether binding is spelled via ":=", "given", "as", "where",
"let ... in ...' "->", ..


> if there's an explicit nonlocal declaration on "average" in the nested
> scope - it's Guido that objected to requiring the explicit scoping
> declaration to access that behaviour.

I suspect, but don't know, that Guido would like that example to "just
work" because it _looks like_ it should "just work".  There's no
visible function involved, and _needing_ to add scope declarations to
make it work is pretty much inexplicable unless the user first learns
more than most users "should" need to learn about how it's
implemented.

If so, no technical argument will change his mind - and especially not
one based on "but the implementation today doesn't do that already".
Recall that Python had no lexically nested scoping at first?  If he
wanted to _make_ users learn about nested lexical scopes to use Python
features that don't _appear_ to use it, Python would have had it from
the start ;-)

I'd be happy enough with needing an explicit declaration too, but am
_happiest_ with what I'm guessing Guido's view is.


> For the implicit version, my request is that any PEP proposing the idea of
> parent local scoping be held to the standard of *actually drafting the patch
> for the language specification*,

Of course.


> rather than handwaving away the hard problems that it creates
> (i.e. what to do at class scope,

I finally looked at class scope and quickly decided it makes no sense
there (a comprehension at class scope has no access to the class
scope, so "same scope in the comprehension as in its immediately
containing block" is incoherent in that case).


> what to do when multiple generators expressions reference the same
> nonlocal name,

Then, as you said, they access the same nonlocal name.  What of it?
What if two generator functions reference (or even rebind) the same
nonlocal name?  What's unclear about that?

If you were happy with explicit scope declarations, then exactly the
same thing would happen as if the user were forced to explicitly
declare the scopes chosen for them.


> what to do with nested comprehensions,

Again, if you were happy with explicit scope declarations, then
exactly the same ...

Where are the docs explaining how nested comprehensions work today?  I
haven't seen any.  If any such exist, I'd bet nothing about them needs
to be changed.  If none such exist, I don't see a need to write any
just for this PEP.  How do nested expressions of any kind work?  Same
thing.

The only thing the suggestion changes is the scope of assignment
expression targets in synthetic functions created to implement
comprehensions.  That has nothing at all to do with the possibility of
nesting, or with the structure of nesting.  Why do you think it does -
or might?


> how to expand comprehensions using this kind of scoping to their statement
> form in a context independent way).

I've already explained why I view that as a non-issue (making a
tedious manual process a relative handful of users undertake once in
their life for self-education purposes slightly less tedious has
approximately no value to me - and, to the contrary, the "context
dependent" bits they may have to learn would make it _more_
educational).  If that's a show-stopper for you, so be it.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-14 Thread Tim Peters
[Greg Ewing ']
> This whole discussion started because someone wanted a way
> to bind a temporary result for use *within* a comprehension.

It's been noted several times recently that the example PEP 572 gives
as _not_ working:

total = 0
progressive_sums = [total := total + value for value in data]

was the original use case that prompted work on the PEP.  You gotta
admit that's ironic ;-)


> Those use cases don't require leakage.

I already said that - yes - some use cases don't want the leakage.
Others do.  And the vast majority of listcomps/genexps will likely
never use an assignment expression anyway.  If "no leak" wins, "want
leak" can't do what they want at all.  If "want leak" wins, the
subset-of-a-subset "no leak" cases are inconvenienced.

As I said, it's a tradeoff.  You're giving 0 weight to one of the sides.


>> Otherwise it's essentially impossible to explain why:
>>
>> total = 0
>> sums = [total := total + value for value in data]
>> assert sums[-1] == total
>>
>> "blows up", despite that its intent is obvious,unless you first
>> explain to a user how the listcomp is implemented via an invisible
>> synthetic function created by magic, inside of which `total`  has
>> nothing to do with the `total` they see on the first line.

> It's no harder to explain that than it is to explain
> why
>
>x = 42
>y = [x * x for x in range(5)]
>print(x)
>
> prints 42 rather than whatever value was last bound to
> the x in the comprehension.

You're overlooking the most relevant point:  The example blows with an
UnboundLocalError, which can't possibly be explained by looking at the
example as it stands, or by reference to shallow tricks.  You _need_
to drag in stuff about invisible (in the code) synthesized scopes in
which `total` is in fact an unbound local name.

You don't need that heavy machinery to explain why for-target names
don't leak; indeed, shallow implementation tricks were used to achieve
that well before synthetic functions _were_ used to implement
listcomps.


> Seems to me it would be easier to explain that *all* names
> bound within a comprehension are local to the comprehension,
> than to have to say that some are and some aren't.

for-targets are local, assignment expression targets aren't.  I agree
that's harder to explain, but on a scale of 1 to 10?  1.

It's a tradeoff.  And I cheerfully just gave 1 point to the side you
favor.  By my accounting, "leak" is still wining by about 136 to 17
;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-14 Thread Tim Peters
[Tim]
>> """
>> An assignment expression binds the target, except in a function F
>> synthesized to implement a list comprehension or generator expression
>> (see XXX).  In the latter case[1], the target is bound in the block
>> containing F, and errors may be detected:  If the target also appears
>> as an identifier target of a `for` loop header in F, a `SyntaxError`
>> exception is raised.  If the block containing F is a class block, a
>> `SyntaxError` exception is raised.
>>
>> Footnote:
>> [1] The intent is that runtime binding of the target occurs as if the
>> binding were performed in the block containing F.  Because that
>> necessarily makes the target not local in F, it's an error if the
>> target also appears in a `for` loop header, which is a local binding
>> for the same target.  If the containing block is a class block, F has
>> no access to that block's scope, so it doesn't make sense to consider
>> the containing block.  The target is bound in the containing block,
>> where it inherits that block's `global` or `nonlocal` declaration if
>> one exists, else establishes that the target is local to that block.
>> """

[Nick]
> This is getting pretty close to being precise enough to be at least
> potentially implementable (thanks!), but there are still two cases that
> would need to be covered:
>
> - what happens inside a lambda expression?

Section 4 of the Reference Manual doesn't contain the word "lambda",
because there's no need to.  "lambda" is just another way to create a
function, and the behavior of functions is already specified.

If you disagree, you mean something by "lambda expression" other than
what I take it to mean, and the best way to illustrate what you do
mean would be via giving a concrete example.  As far as I'm concerned,

   ... (lambda...:  expression) ...

is exactly the same as

def _hidden_unique_name(...):
return expression
... (_hidden_unique_name) 

Even at class scope ;-)

For example,

def f():
g = lambda n:  [(n := n+1) for i in range(1)]
return g(10)

is the same as:

def f():
def g(n):
return  [(n := n+1) for i in range(1)]
return g(10)

When the listcomp synthesizes a function, g's code block will
immediately contain it.  The text at the top says `n` is bound in the
containing block - which is g's.  `n` is already local to `g`, so that
part is just redundant in this case.  The synthetic function will take
10 (via its nonlocal cell), add 1, and again via the cell rebind g's
`n` to 11.  The synthetic function returns [11] and the rebound `n`
vanishes with its execution frame.

But that's all about how functions behave; "lambda" is just incidental.


> - what happens inside another comprehension or generator expression?

I don't know what you're asking about, so I'll just copy part of a
different reply:

"""
Where are the docs explaining how nested comprehensions work today?  I
haven't seen any.  If any such exist, I'd bet nothing about them needs
to be changed.  If none such exist, I don't see a need to write any
just for this PEP.  How do nested expressions of any kind work?  Same
thing.

The only thing the suggestion changes is the scope of assignment
expression targets in synthetic functions created to implement
comprehensions.  That has nothing at all to do with the possibility of
nesting, or with the structure of nesting.  Why do you think it does -
or might?
"""

Note that a day or two ago I posted a complete expansion of what

list(i + sum((i := i+1) for j in range(i)) + i
  for i in range(5))

would do.  There the inner genexp rebinds the outer genexp's local
for-target.  Atrocious.  Here's how you can do the same:

Replace `(i := i+1)` with `(LATER)`.

Generate the complete expansion for how the assignment-expression-free
derived statement is implemented, presumably following the detailed
docs that don't appear to exist ;-) for how that's done.

In the synthetic function created for the inner genexp, add

nonlocal i

at the top and replace

yield (LATER)

with

 yield (i := i+1)

Now I didn't replace anything with "LATER", because I never thought
adding a new binary operator had anything to do with this process to
begin with ;-)  All that's needed is to add cruft _after_ it's done to
establish the intended scopes for assignment statement targets.

If you ask how the inner genexp "knows" that it needs to access the
outer genexp's `i`, it doesn't directly.  It's simply that the
synthetic inner genexp function is nested inside the synthetic outer
genexp function, and the text at top says that the inner genexp binds
`i` in its containing block - which is the block for the synthetic
outer genexp.

If the functions synthesized for nested comprehensions don't _already_
nest in this way, then they'd already screw up merely accessing outer
comprehension names from within inner comprehensions.


Is there a reason to suspect that there's anything inherently unique
to 

Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-14 Thread Tim Peters
>> total = 0
>> progressive_sums = [total := total + value for value in data]

[Greg Ewing ]
> I'm skeptical that it's a good idea to encourage this kind of thing
> in the first place.

Adding a feature is creating possibility for its use, not encouraging
its use.  I'd hate to, ,e.g., see people encouraging use of ternary-if
either,  _Sometimes_ ternary-if is exactly right, both shortening code
and increasing clarity.  But not often.  Ditto for metaclasses, the
ability to nest blocks 20 levels deep, on & on.  That they can be
abused, and sometimes are abused, is a very weak argument against
them.  On a scale of 1 to 100?  Maybe 23 ;-)

The specific use case that grabbed Guido's attention on this one
wasn't the above, but:

while any(n % p == 0 for p in small_primes):
// divide p out of n - but which p succeeded?

 _General_ features turn out to have multiple good uses, not all of
which are anticipated.  I think this is an example of that.  It's easy
to miss that the argument to `any()`  runs in an entirely different
scope, so that its local `p` vanishes when `any()` completes.

It's not proposed to change that either, because people overwhelmingly
seem to want `for` targets not to leak.  Fine by me.

Under the proposal, it's straightforward to use an assignment
expression to "export" p's last materialized value:

while any(n % (factor := p) == 0 for p in small_primes):
n //= factor

To my eyes, the most surprising thing about that is why it's necessary
to "export" p's winning value to begin with.  "Because for-targets
always vanish, so there's another form of explicit binding that
doesn't" is a reasonably satisfying explanation.

Nick would be happiest, and I'd be "happy enough", if all forms of
binding remained "local" and instead we fiddled the syntax to allow
explicitly declaring the desired scope.  Then, e.g.,

while any(
n % p == 0 for p in small_primes):
n //= p

wouldn't need an assignment expression at all.  But Guido doesn't like
that.  I think he wants to hide, so far as humanly possible, that
listcomps and genexps happen to implemented now by synthesizing
functions.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Crazy idea: allow keywords as names in certain positions

2018-05-14 Thread Tim Peters
[Terry Reedy ]
> ...
> The impact on real-time syntax coloring should be considered.  An re-based
> colorizer, like IDLE's, tags every appearance of keywords outside of strings
> and comments, syntactically correct or not. For instance, the first two
> 'and's in "a and b.and # and error" are tagged. To not tag the second would
> require full parsing, presumably by compiling to AST, and more support code.
> I suspect this would would make coloring slower.  Whether too slow for
> existing machines, I don't know.

I can pretty much guarantee it would be slower - way slower.

But regexps could still do the bulk of it, provided the contexts
remain as simple-minded as the specific ones Guido suggested:

For example, we could allow keywords after 'def' and after a period,

So when the IDLE colorizer regexp finds a match with the "KEYWORD"
group name, don't color it at once.  Instead look back from the start
of the _possible_ keyword match, to see whether it's preceded by

 period maybe_whitespace
or
 "def"-starting-at-a-word-boundary  at_least_one_whitespace

and if so skip coloring.

There are many ways to code that, but - ya - they're annoying.  Alas,
the re module's negative lookbehind assertion isn't powerful enough
for it, else those checks could be added directly to the keyword part
of the colorizing regexp.  I expect (but don't know) that lookbehinds
in MRAB's "regex" module are strong enough for it.

Of course that's just IDLE.  It was remarkably easy to teach my
primary editor (Source Insight) all sorts of stuff about .py files,
but its "keywords" subsystem is restricted to specifying literal
strings to look for.  The kinds of tricks above probably require that
the tool have access to a real programming language (like IDLE and
Emacs enjoy).
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A partial (wacky?) alternative to assignment expressions

2018-05-14 Thread Tim Peters
[Steven D'Aprano ]
> I'm hoping that the arguments for assignment expressions will be over by
> Christmas *wink* so as a partial (and hopefully less controversial)
> alternative, what do people think of the idea of flagging certain
> expressions as "pure functions" so the compiler can automatically cache
> results from it?
>
> Let me explain: one of the use-cases for assignment expressions is to
> reduce repetition of code which may be expensive. A toy example:
>
> func(arg) + func(arg)*2 + func(arg)**2
>
> If func() is a pure function with no side-effects, that is three times
> as costly as it ought to be:
>
> (f := func(arg)) + f*2 + f**2
>
> Functional languages like Haskell can and do make this optimization all
> the time (or so I am lead to believe), because the compiler knows that
> func must be a pure, side-effect-free function. But the Python
> interpreter cannot do this optimization for us, because it has no way of
> knowing whether func() is a pure function.
>
> Now for the wacky idea: suppose we could tell the interpreter to cache
> the result of some sub-expression, and re-use it within the current
> expression? That would satisfy one use-case for assignment operators,
> and perhaps weaken the need for := operator.
>
> Good idea? Dumb idea?

Despite that Haskell can do optimizations like this , its "let ... in
..." and "... where ..." constructs (which create names for
expressions, for use in another expression or code block) are widely
used anyway.  They don't care about the optimization (they already get
it), but about improving clarity.  In Haskell they'd spell it like,
e.g., (mixing Python with Haskell keywords in UPPERCASE)

:LET fa = func(arg) IN fa + fa*2 + fa**2

which the compiler may (but almost certainly won't) optimize further to

LET fa = func(arg) IN fa * (3 + fa)

if it knows that fa is of a type for which that makes sense.

In Python today, I expect most people would do it as

t = func(arg)
t + 2*t + t*t  # or t*(3+t)

because they also know that multiplying t by itself once is usually
faster than squaring ;-)  And they wouldn't _want_ all the redundant
typing in

 func(arg) + func(arg)*2 + func(arg)**2

anyway.

So I'm not saying "good" or "bad", but that it needs a more compelling use case.


> Good idea, but you want the assignment operator regardless?

I'd probably write the example the way "I expect most people would do
it" above even if we do get assignment expressions.


> I don't have a suggestion for syntax yet, so I'm going to make up syntax
> which is *clearly and obviously rubbish*, a mere placeholder, so don't
> bother telling me all the myriad ways it sucks. I know it sucks, that's
> deliberate. Please focus on the *concept*, not the syntax.
>
> We would need to flag which expression can be cached because it is PURE,
> and tag how far the CACHE operates over:
>
> 
> 
> func(arg)
> 
> + func(arg)*2 + func(arg)**2
> 

That syntax is clearly and obviously rubbish!  It sucks.  You're welcome ;-)


> This would tell the compiler to only evaluate the sub-expression
> "func(arg)" once, cache the result, and re-use it each other time it
> sees that same sub-expression within the surrounding expression.
>
> To be clear: it doesn't matter whether or not the sub-expression
> actually is pure. And it doesn't have to be a function call: it could be
> anything legal in an expression.
>
> If we had this, with appropriately awesome syntax, would that negate the
> usefulness of assignment expresions in your mind?

The use cases I envision for that have no intersection with use cases
I have for assignment expressions, so, no.

My first thought about where it might be handy probably has no
intersection with what you were thinking of either ;-)



 math.ceil
 math.floor

def int_away_from_zero(x):
if x >= 0:
return math.ceil(x)
else:
 return math.floor(x)


The body of `int_away_from_zero()` is the way I _want_ to write it.
But in heavily used functions it's expensive to look up "math", then
look up its "ceil" (or "floor") attribute, on every call.  So stuff
like this often abuses default arguments instead:

def int_away_from_zero(x, mc=math.ceil, mf=math.floor):
if x >= 0:
return mc(x)
else:
 return mf(x)

As the function grows over time, the default arg abuse grows, and the
body of the function gets more obscure as more-&-more "tiny names" are
introduced to save on repeated global and module attribute lookups.

Indeed, in many cases I'd like to wrap an entire module in   ,  with oodles of "module.attribute" thingies
in the  block.  _Most_ of my code gets no benefit from Python's
"extremely dynamic" treatment of module.attribute.  It would be great
if Python could do those lookups once at module import time, then

Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-14 Thread Tim Peters
Just noting some real code I typed today where `given` works great if
it allows unpacking syntax, and assignment expressions don't:

while True:
head, matched, s = s.partition(sep)
if not matched:
break

Using `given`:

 while matched given head, matched, s = s.partition(sep):

Typing "matched " twice still sucks, though;-)

It doesn't work as well with assignment expressions even if they were
(re)generalized to allow unpacking syntax.  In this specific case, I'd
keep the original loop-and-a-half rather than do:

while (t := s.partition(sep})[1]:
head, matched, s = t

With unpacking syntax restored, same answer:

while (head, matched, s := s.partition(sep})[1]:

or
while [(head, matched, s := s.partition(sep}), matched][-1]:

to combine the worst annoyances of everything and add another ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-14 Thread Tim Peters
[Tim]
>> It's been noted several times recently that the example PEP 572 gives
>> as _not_ working:
>>
>> total = 0
>> progressive_sums = [total := total + value for value in data]
>>
>> was the original use case that prompted work on the PEP.  You gotta
>> admit that's ironic ;-)

[Nick]
> After pondering this case further, I think it's also worth noting that that
> *particular* example could also be addressed by:
>
> 1. Allowing augmented assignment *expressions*
> 2. Changing the scoping rules for augmented assignment operations in general
> such that they *don't change the scope of the referenced name*
>
> Writing "i += n" without first declaring the scope of "i" with "i = 0",
> "nonlocal i" or "global i" is one of the most common sources of
> UnboundLocalError after all, so I'd be surprised to find anyone that
> considered the current augmented assignment scoping rules to be outside the
> realm of reconsideration.

Yes, it's worth considering.  In my experience, I don't believe I've
ever had an UnboundLocalError for which a correct fix was to add
`nonlocal`.  Usually the correct fix was to add `global`, but that's
mostly due to an old habit of using piles of globals to count trips
through various code paths, used by a watchdog thread that
periodically wakes up to display (the global) counts.


> The accumulation example would then be written:
>
> total = 0
> progressive_sums = [total += value for value in data]
> if progressive_sums:
> assert total == progressive_sums[-1]

And the divide-out-small-primes example could be

factor = -42
while any(n % (factor += p - factor) == 0 for p in small_primes):
n //= factor

Heh ;-)


> The question would then turn to "What if you just want to bind the target
> name, without considering the old value?". And then *that's* where "NAME : =
> EXPR" would come in: as an augmented assignment operator that used augmented
> assignment scoping semantics, rather than regular local name binding
> semantics.

Plain old ":=" would somehow be viewed as being an augmented
assignment operator too? ... OK, the meaning is that augmented
assignment _and_ "::=" would resolve the target's scope in the way the
containing block resolves it.


> That would mean *directly* overturning PEP 3099's rejection of the idea of
> using "NAME := EXPR" to imply "nonlocal NAME" at function scope, but that's
> effectively on the table for implicit functions anyway (and I'd prefer to
> have ":=" be consistent everywhere, rather than having to special case the
> implicit scopes).

Creating a local in a containing scope by magic is never done by
Python today.  Extending that beyond "need" seems potentially
perilous.  For example, it can already be tedious to figure out which
names _are_ local to a function by staring at the function's code, but
people quickly get better at that over time; change the rules so that
they _also_ have to stare at all immediately contained  functions too
to figure it out, and it may become significantly harder (OK, I didn't
declare `x`, and a contained function did `x := 3.14` but `x` isn't
declared there either - I guess it's my `x` now).  Then again, if
they're doing that much function nesting they deserve whatever they
get ;-)

Restrict it to that only synthetically generated functions can pull
off this trick by magic (where there are real use cases to motivate
it), and they still don't have to look outside the body of a
function's text to figure it out.  Visually, there's no distinction
between the code running in the function's scope and in scopes
synthesized to implement comprehensions appearing in the function's
text.  The comprehensions aren't even indented more.

So, offhand, I'm not sure that the right way to address something you
view as a wart is to vastly expand its reach to 12 operators that
impose it on everyone everywhere every time they're used ;-)

Seriously, I do suspect that in

def f(...):
... no instances of `s` ...
s += f"START {time.time():.2f}"

it's overwhelmingly more likely that they simply forgot to do

s = ""

earlier in `f` than they actually wanted to append to whatever `s`
means in f's parent block..  That's a radical change to what people
have come to expect `NAME +=` to do.

OTOH, I don't (yet?) see a way the change could break code that
currently works, so it remains worth thinking about.

BTW, would

def f():
x := 3.14
x = 3.14

be a compile-time error?  Everyone agreed the analogous case would be
in synthetic functions.  Fine by me!
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-07 Thread Tim Peters
[Tim]
> While I don't have real use cases beyond that, given that much,
> "consistency" kicks in to suggest that:
>
> def f():
>  [x := 42 for x in range(1)]
>
> makes `x` local to `f` despite that x wasn't bound elsewhere in f's body.
>
> def f():
> global x
>  [x := 42 for x in range(1)]
>
> binds the global `x`.
>
> def f():
> nonlocal x
>  [x := 42 for x in range(1)]
>
> binds `x` in the closest-containing scope in which `x` is local.  The
> last two act as if the declaration of `x` in `f` were duplicated at
> the start of the synthetic function.
>
> More succinctly, that `x := RHS` in a synthetic function "act the
> same" as `x = RHS` appearing in the scope directly containing the
> synthetic function.

Oh, fudge - I wasn't trying to make a silly subtle point by reusing
`x` as the `for` variable too.  Pretend those all said "for i in
range(1)" instead.  Of course what happens if `x` is used in both
places needs to be defined, but that's entirely beside the intended
point _here_.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-07 Thread Tim Peters
[Terry Reedy ]
> ...
> If I am understanding correctly, this would also let one *intentionally
> 'leak' (export) the last value of the loop variable when wanted.
>
> [math.log(xlast:=x) for x in it if x > 0]
> print(xlast)

Yup!  You can do that today by emulating a nonlocal "cell" reference,
but I don't recommend it ;-)

xlast_cell = [None]
[math.log(x) for x in it if x > 0
 for xlast_cell[0] in [x]]
print(xlast_cell[0])
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A cute Python implementation of itertools.tee

2018-05-07 Thread Tim Peters
I posted this several weeks ago, just for fun - an obscure but
surprisingly brief Python implementation of itertools.tee, sharing a
single stream (as a singly linked list) among all the returned
iterables.

Didn't think about it again until today, when recent discussions of
lexical scoping made me wonder "hmm - when's the last time I even used
nonlocal?".  Well, this was.  And shortly thereafter, "but there's no
need to keep track of `last` at all!"  I have no idea where that
thought came from :-)

> def mytee(xs, n):
> last = [None, None]
>
> def gen(it, mylast):
> nonlocal last
> while True:
> mylast = mylast[1]
> if not mylast:
> mylast = last[1] = last = [next(it), None]
> yield mylast[0]
>
> it = iter(xs)
> return tuple(gen(it, last) for _ in range(n))

So, in fact, the inner function there can be replaced by the even briefer:

def gen(it, mylast):
while True:
if mylast[1] is None:
mylast[1] = [next(it), None]
mylast = mylast[1]
yield mylast[0]

To make it more relevant to current discussions, collapsing the last
two lines using a binding expression:

yield (mylast := mylast[1])[0]

isn't even slightly tempting.  Most of the cases where it would be
_possible_ to use binding expressions don't strike me as being
_sensible_ places to use them - slamming together conceptually
different tasks just because it's possible to do so.  But because name
bindings are so very common, that still leaves plenty where the
transformation leaves the code clearer to my eyes (usually just a
little, rarely a lot).  There are no such cases in the code above.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-07 Thread Tim Peters
[Tim]
> ... The ":= is treated specially in comprehensions" idea is aimed
> more at them than at people who think invoking a synthetic
> anonymous lambda "is obvious".

It occurs to me that, while suggestive, this is an unhelpful way to
express it.  It's not at all that the semantics of ":=" change inside
a listcomp/genexp, it's that the latter's idea of intended _scopes_
for names is made more nuanced (inside a synthetic function created to
implement a listcomp/genexp, names bound by "=" are local; names bound
by ":=" are nonlocal; names bound by both are "who cares?"-
compiler-time error would be fine by me, or the first person to show a
real use case wins).

Regardless, the runtime implementation of ":=" remains the same everywhere.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-09 Thread Tim Peters
...

[Guido]
>> We should probably define what happens when you write [p := p for p in
>> range(10)]. I propose that this overwrites the loop control variable rather
>> than creating a second p in the containing scope -- either way it's probably
>> a typo anyway.

[Jacco van Dorp ]
> My naive assumption would be both.

Since this is all about scope, while I'm not 100% sure of what Guido
meant, I assumed he was saying "p can only have one scope in the
synthetic function:  local or non-local, not both, and local is what I
propose".  For example, let's flesh out his example a bit more:

p = 42
[p := p for p in range(10) if p == 3]
print(p) # 42?  3?  9?

If `p` is local to the listcomp, it must print 42.  If `p` is
not-local, it must print 9.  If it's some weird mixture of both, 3
makes most sense (the only time `p := p` is executed is when the `for`
target `p` is 3).

 > If it's just the insertion of a nonlocal statement like Tim suggested,

Then all occurrences of `p` in the listcomp are not-local, and the
example above prints 9..

> wouldn't the comprehension blow up to:
>
> def implicitfunc()
>   nonlocal p
>   templist = []
>   for p in range(10):
> p = p
> templist.append(p)
>   return templist
>
> ?

Yes.


> If it were [q := p for p in range(10)], it would be:
>
> def implicitfunc()
>   nonlocal q
>   templist = []
>   for p in range(10):
> q = p
> templist.append(q)
>   return templist

There's no question about that one, because `q` isn't _also_ used as a
`for` target.  There are two "rules" here:

1. A name appearing as a `for` target is local.  That's already the case.

2. All other names (including a name appearing as a binding-expression
target) are not local.

Clearer?  If a name appears as both, which rule applies?  "Both" is
likely the worst possible answer, since it's incoherent ;-)  If a name
appears as both a `for` target and as a binding-expression target,
that particular way of phrasing "the rules" suggests #1 (it's local,
period) is the more natural choice.  And, whether Guido consciously
knows it or not, that's why he suggested it ;-)


> Why would it need to be treated differently ?

Because it's incoherent.  It's impossible to make the example above
print 3 _merely_ by fiddling the scope of `p`.  Under the covers, two
distinct variables would need to be created, both of which are named
`p` as far as the user can see.  For my extension of Guido's example:

def implicitfunc()
nonlocal p
templist = []
for hidden_loop_p in range(10):
if hidden_loop_p == 3:
p = hidden_loop_p
templist.append(hidden_loop_p)
return templist


[Tim]
>> A compile-time error would be fine by me too.  Creating two meanings
>> for `p` is nuts - pick one in case of conflict.  I suggested before
>> that the first person with a real use case for this silliness should
>> get the meaning their use case needs, but nobody bit, so "it's local
>> then" is fine.

> x = x is legal. Why wouldn't p := p be ?

It's easy to make it "legal":  just say `p is local, period` or `p is
not local, period`.  The former will confuse people who think "but
names appearing as binding-expression targets are not local", and the
latter will confuse people who think "but names appearing as `for`
targets are local".

Why bother?  In the absence of an actual use case (still conspicuous
by absence), I'd be happiest refusing to compile such pathological
code.  Else `p is local, period` is the best pointless choice ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-28 Thread Tim Peters
[Steven D'Aprano ]
> Chris' PEP 572 started off with the concept that binding expressions
> would create a "sub-local" scope, below function locals. After some
> debate on Python-Ideas, Chris, Nick and Guido took the discussion off
> list and decided to drop the sub-local scope idea as confusing and hard
> to implement.

Enormously harder to implement than binding expressions, and the
latter (to my eyes) capture many high-value use cases "good enough".

I'm not concerned about "confusing".  "Sub-local" scopes are
ubiquitous in modern languages, and they're aimed at experienced
programmers.

It was also the case that nesting scopes _at all_ was very
controversial in Python's earliest years, and Guido resisted it
mightily (with my full support).  The only scopes at first were
function-local, module-global, and builtin, and while functions could
_textually_ nest, they had no access to enclosing local scopes.  Cute:
 to write a recursive nested function, you needed to pass it its own
name (because its own name isn't in its _own_ local scope, but in the
local scope of the function that contains it).

Adding nested local scopes was also "confusing" at the time, and
indeed made the scoping rules far harder to explain to newbies, and
complicated the implementation.  Then again, experienced programmers
overwhelmingly (unanimously?) welcomed the change after it was done.

Since then, Python has gone down a pretty bizarre path, inventing
sublocal scopes on an ad hoc basis by "pure magic" when their absence
in some specific context seemed just too unbearable to live with
(e.g., in comprehensions).  So we already have sublocal scopes, but in
no explicit form that can be either exploited or explained.


> But the biggest problem is that this re-introduces exactly the same
> awful C mistake that := was chosen to avoid. Which of the following two
> contains the typo?
>
> local(spam=expression, eggs=expression, cheese = spam+eggs)
>
> local(spam=expression, eggs=expression, cheese == spam+eggs)

Neither :-)  I don't expect that to be a real problem.  In C I'm
_thinking_ "if a equals b" and type "if (a=b)" by mistake in haste.
In a "local" I'm _ thinking_ "I want to create these names with these
values" in the former case, and in the latter case also "and I want to
to test whether cheese equals spam + eggs".  But having already typed
"=" to mean "binding" twice in the same line, "but the third time I
type it it will mean equality instead" just doesn't seem likely.

The original C mistake is exceedingly unlikely on the face of it:  if
what I'm thinking is "if a equals b", or "while a equals b", I'm not
going to use "local()" _at all_.  Forcing the programmer to be
explicit about that they're trying to create a new scope limits the
only possible confusions to cases where they _are_ going out of their
way to use a "local()" construct, in which case binding behavior is
very much at the top of their mind.  Plain old "if a=b" remains a
SyntaxError regardless.

Still, if people are scared of that, a variation of Yury's alternative
avoids it:  the last "argument" must be an expression (not a binding).
In that case your first line above is a compile-time error.

I didn't like that because I really dislike the textual redundancy in the common

if local(matchobject=re.match(regexp, line), matchobject):

compared to

if local(matchobject=re.match(regexp, line)):

But I could compromise ;-)

- There must be at least one argument.
- The first argument must be a binding.
- All but the last argument must also be bindings.
- If there's more than one argument, the last argument must be an expression.

Then your first line above is a compile-time error, the common "just
name a result and test its truthiness" doesn't require repetition, and
"local(a==b)" is also a compile-time error.


> I have other objections, but I'll leave them for now, since I think
> these two alone are fatal.

I don't.


> Once you drop those two flaws, you're basically left with PEP 572 :-)

Which is fine by me, but do realize that since PEP 572 dropped any
notion of sublocal scopes, that recurring issue remains wholly
unaddressed regardless.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] A "local" pseudo-function

2018-04-27 Thread Tim Peters
A brain dump, inspired by various use cases that came up during the
binding expression discussions.

Idea:  introduce a "local" pseudo-function to capture the idea of
initialized names with limited scope.

As an expression, it's

"local" "(" arguments ")"

- Because it "looks like" a function call, nobody will expect the targets
  of named arguments to be fancier than plain names.

- `a=12` in the "argument" list will (& helpfully so) mean pretty much the
  same as "a=12" in a "def" statement.

- In a "local call" on its own, the scope of a named argument begins at the
  start of the next (if any) argument, and ends at the closing ")".  For the
  duration, any variable of the same name in an enclosing scope is shadowed.

- The parentheses allow for extending over multiple lines without needing
  to teach editors (etc) any new tricks (they already know how to format
  function calls with arglists spilling over multiple lines).

- The _value_ of a local "call" is the value of its last "argument".  In
  part, this is a way to sneak in C's comma operator without adding cryptic
  new line noise syntax.

Time for an example.  First a useless one:

a = 1
b = 2
c = local(a=3) * local(b=4)

Then `c` is 12, but `a` is still 1 and `b` is still 2.  Same thing in the end:

c = local(a=3, b=4, a*b)

And just to be obscure, also the same:

c = local(a=3, b=local(a=2, a*a), a*b)

There the inner `a=2` temporarily shadows the outer `a=3` just long
enough to compute `a*a` (4).

This is one that little else really handled nicely:

r1, r2 = local(D = b**2 - 4*a*c,
   sqrtD = math.sqrt(D),
   twoa = 2*a,
   ((-b + sqrtD)/twoa, (-b - sqrtD)/twoa))

Everyone's favorite:

if local(m = re.match(regexp, line)):
print(m.group(0))

Here's where it's truly essential that the compiler know everything
about "local", because in _that_ context it's required that the new
scope extend through the end of the entire block construct (exactly
what that means TBD - certainly through the end of the `if` block, but
possibly also through the end of its associated (if any) `elif` and
`else` blocks - and similarly for while/else constructs).

Of course that example could also be written as:

if local(m = re.match(regexp, line), m):
print(m.group(0))

or more specifically:

if local(m = re.match(regexp, line), m is not None):
print(m.group(0))

or even:

if local(m = re.match(regexp, line)) is not None:
print(m.group(0))

A listcomp example, building the squares of integers from an iterable
but only when the square is a multiple of 18:

squares18 = [i2 for i in iterable if local(i2=i*i) % 18 == 0]

That's a bit mind-bending, but becomes clear if you picture the
kinda-equivalent nest:

for i in iterable:
if local(i2=i*i) % 18 == 0:
append i2 to the output list

That should also make clear that if `iterable` or `i` had been named
`i2` instead, no problem.  The `i2` created by `local()` is in a
wholly enclosed scope.

Drawbacks:  since this is just a brain dump, absolutely none ;-)

Q:  Some of those would be clearer if it were the more Haskell-like

local(...) "in" expression

A: Yup, but for some of the others needing to add "in m" would be
annoyingly redundant noise.  Making an "in" clause optional doesn't
really fly either, because then

local(a='z') in 'xyz'

would be ambiguous.  Is it meant to return `'xyz'`, or evaluate `'z'
in 'xyz'`?  And any connector other than "in" would make the loose
resemblance to Haskell purely imaginary ;-)

Q: Didn't you drone on about how assignment expressions with complex
targets seemed essentially useless without also introducing a "comma
operator" - and now you're sneaking the latter in but _still_ avoiding
complex targets?!

A. Yes, and yes :-)  The syntactic complexity of the fully general
assignment statement is just too crushing to _sanely_ shoehorn into
any "expression-like" context.

Q:  What's the value of this?   local(a=7, local(a=a+1, a*2))

A: 16.  Obviously.

Q:  Wow - that _is_ obvious!  OK, what about this, where there is no
`a` in any enclosing scope:  local(a)

A: I think it should raise NameError, just like a function call would.
There is no _intent_ here to allow merely declaring a local variable
without supplying an initial value.

Q: What about local(2, 4, 5)?

A: It should return 5, and introduce no names.  I don't see a point to
trying to outlaw stupidity ;-)  Then again, it would be consistent
with the _intent_ to require that all but the last "argument" be of
the `name=expression` form.

Q: Isn't changing the meaning of scope depending on context wy magical?

A: Yup!  But in a language with such a strong distinction between
statements and expressions, without a bit of deep magic there's no
single syntax I can dream up that could work well for both that didn't
require _some_ deep magic.  The gimmick here is something I expect
will be surprising the first time it's seen, less so the second, and
then 

Re: [Python-ideas] PEP 572: about the operator precedence of :=

2018-05-09 Thread Tim Peters
[Guido]
> (I vaguely recall this has been brought up before, but I'm too lazy to find
> the subtread. So it goes.)
>
> PEP 572 currently seems to specify that when used in expressions, the
> precedence of `:=` is lower (i.e. it binds more tightly)

Umm ... that's the opposite of what the Reference Manual says "lower":means:

"""
6.16. Operator precedence

The following table summarizes the operator precedence in Python, from
lowest precedence (least binding) to highest precedence (most
binding).
"""


> than all operators except for the comma.

Which gets more potentially confusing become the comma isn't listed at
all as "an operator".   Instead the meaning of commas for building
tuples is captured by the higher-level `expression-list` grammar
production:

expression_list ::=  expression ( "," expression )* [","]

So _every_ mere operator "binds more tightly" (if you view it in those
terms) than a comma in an expression list.

What was mostly discussed before - after giving up on fully generally
"assignment expressions" - was whether ":=" should get a precedence
between comparison and bitwise OR operators.  So that, e.g., parens
wouldn't be needed in

if x := match() is not None:

to get the intended

if (x := match()) is not None:

But that never gained much traction, and it was dropped quickly.  It
was left with a (possibly unwritten!) consensus that ":=" should bind
very weakly as an operator.  Of the binary infix operators, the most
weakly binding (the weakest of which now is Boolean OR).


> I derive this from the single example
> `stuff = [[y := f(x), x/y] for x in range(5)]`.`

As above, the part you're looking at there falls under the
expression_list part of the grammar, and no binary operator could
group as

y OP (f(x), x/y)

instead.  All binary operators group as

(y OP f(x)), (x/y)


> From this it would follow that `f(a := 1, a)`

And now you're throwing in the entirely different meaning of commas in
argument lists ;-)  But same thing:  no operator can cross comma
boundaries in argument lists, because the grammar  of argument lists
doesn't allow for that either.  Even; e.g.,

f(lambda x: x, 2)

groups as

f((lambda x: x), 2)


> is equivalent to `a = 1; f(1, 1)`,

Yup.

> and also that `(a := 1, a)` is equivalent to `a = 1; (1, 1)`. (Although
> M.A.L. objected to this.)

That's just another instance of expression_list.

Nothing so far has surprised me, because I have been viewing ":=" as
an operator for some time now.


> But what should `a := 1, 1` at the top level (as a statement) do? On the one
> hand, analogy with the above suggest that it is equivalent to
>`a = 1; (1, 1)`.

If it's an operator, there's really no choice about that.


> But on the other hand, it would be really strange if the following two
> lines had different meanings:
>
> a = 1, 1   # a = (1, 1)
> a := 1, 1  # a = 1; (1, 1)

Agreed.


> I now think that the best way out is to rule `:=` in the top level
> expression of an expression statement completely (it would still be okay
> inside parentheses, where it would bind tighter than comma).

Probably prudent, and no objections here.


> An alternative would be to make `:=` bind less tight than comma (like `=`)
> everywhere, so that `a := 1, 1` indeed meant the same as `a = 1, 1`. But
> that feels very wrong at least for the case `f(a := 1, 1)` -- I believe Tim
> already mentioned that we've been conditioned by keyword arguments to parse
> this as `f((a := 1), 1)`. (I could add to this that we have done various
> things to generator expression syntax to avoid ever having to deal with
> ambiguities like `a, a+1 for a in range(10)` or `a for a in x, y`.)

Offhand, since comma is not formally "an operator" now, I expect it
would require major surgery to the grammar to have

a := 1, 1

group as

a := (1, 1)

in any context.  At least if ";=" is treated as "just another
operator" and doesn't grow its own unique-to-it pile of grammar rules.


> Another alternative would be to always require parentheses around `:=`, so
> that we would have to write `f((a := 1), 1)`. That's unambiguous, but
> otherwise just gets on the nerves.

I hoped that rigidly calling these "binding expressions" would break
the connection in peoples' minds that these somehow "should behave"
like assignment statements, but that seems futile.  There's really
nothing surprising here _if_ it's viewed as just another operator.
Nobody would be tempted to, e.g., add parens to f(a + 1, 1), or if `+`
were any other binary operator either.  So, over time, it would be
just as annoying need to type them for the `:=` operator.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-10 Thread Tim Peters
[Nick Coghlan  ]
> How would you expect this to work in cases where the generator expression
> isn't immediately consumed? If "p" is nonlocal (or global) by default, then
> that opens up the opportunity for it to be rebound between generator steps.
> That gets especially confusing if you have multiple generator expressions in
> the same scope iterating in parallel using the same binding target:

I'm most interested in what sensible programmers can do easily that's
of use, not really about  pathologies that can be contrived.

> # This is fine
> gen1 = (p for p in range(10))
> gen2 = (p for p in gen1)
> print(list(gen2))

Sure.

>
> # This is not (given the "let's reintroduce leaking from comprehensions" 
> proposal)

Be fair:  it's not _re_introducing anything.  It's brand new syntax
for which "it's a very much intended feature" that a not-local name
can be bound.  You have to go out of your way to use it.  Where it
doesn't do what you want, don't use it.

> p = 0

I'm not sure of the intent of that line.  If `p` is otherwise unknown
in this block, its appearance as a binding operator target in an
immediately contained genexp establishes that `p` is local to this
block.  So `p = 0` here just establishes that directly.  Best I can
guess, the 0 value is never used below.

> gen1 = (p := q for q in range(10))

I expect that's a compile time error, grouping as

gen1 = (p := (q for q in range(10)))

but without those explicit parentheses delimiting the "genexp part" it
may not be _recognized_ as being a genexp.  With the extra parens, it
binds both `gen1` and `p` to the genexp, and `p` doesn't appear in the
body of the genexp at all.  Or did you intend

gen1 = ((p := q) for q in range(10))

?  I'll assume that's so.


> gen2 = (p, p := q for q in gen1)

OK, I really have no guess about the intent there.  Note that

gen2 = (p, q for q in gen1)

is a syntax error today, while

gen2 = (p, (q for q in gen1))

builds a 2-tuple.  Perhaps

gen2 = ((p, p := q) for q in gen1)

was intended?

Summarizing:

gen1 = ((p := q) for q in range(10))
gen2 = ((p, p := q) for q in gen1)

is my best guess.

> print(list(gen2))

[(0, 0), (1, 1), (2, 2), ..., (9, 9)]

But  let's not pretend it's impossible to do that today; e.g., this
code produces the same:

class Cell:
def __init__(self, value=None):
self.bind(value)
def bind(self, value):
self.value = value
return value

p = Cell()
gen1 = (p.bind(q) for q in range(10))
gen2 = ((p.value, p.bind(q)) for q in gen1)
print(list(gen2))

Someone using ":=" INTENDS to bind the name, just as much as someone
deliberately using that `Cell` class.

> It also reintroduces the original problem that comprehension scopes solved,
> just in a slightly different form:
>
> # This is fine
> for x in range(10):
> for y in range(10):
> transposed_related_coords = [y, x for x, y in related_coords(x, 
> y)]

I'm not clear on what "This is fine" means, other than that the code
does whatever it does.  That's part of why I so strongly prefer
real-life use cases.  In the code above, I can't imagine what the
intent of the code might be _unless_ they're running tons of
otherwise-useless code for _side effects_ performed by calling
`related_coords()`.  If "it's functional", they could do the same via

x = y = 9
transposed_related_coords = [y, x for x, y in related_coords(x, y)]

except that's a syntax error ;-)  I assume

transposed_related_coords = [(y, x) for x, y in related_coords(x, y)]

was intended.

BTW, I'd shoot anyone who tried to check in that code today ;-)  It
inherently relies on that the name `x` inside the listcomp refers to
two entirely different scopes, and that's Poor Practice (the `x` in
the `related_coords()` call refers to the `x` in `for x in range(10)`,
but all other instances of `x` refer to the listcomp-local `x`).


> # This is not (given the "let's reintroduce leaking from comprehensions" 
> proposal)
> for x in range(10):
> for y in range(10):
> related_interesting_coords = [x, y for x in related_x_coord(x, y)
> if 
> is_interesting(y := f(x))]

Same syntax error there (you need parens around "x, y" at the start of
the listcomp).

Presumably they _intended_ to build (x, f(x)) pairs when and only when
`f(x)` "is interesting".  In what specific way does the code fail to
do that?  Yes, the outer `y` is rebound, but what of it?  When the
statement completes, `y` will be rebound to the next value from the
inner range(10), and that's the value of `y` seen by
`related_x_coord(x, y)` the next time the loop body runs.  The binding
done by `:=` is irrelevant to that.

So I don't see your point in that specific example, although - sure! -
of course it's possible to contrive examples where it really would
matter.  

Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-10 Thread Tim Peters
Just a quickie - I'm out of time for now.

[Guido]
>> That's just one of several "don't do that" situations. *What will happen*
>> is perhaps hard to see at a glance, but it's perfectly well specified. Not
>> all legal code does something useful though, and in this case the obvious
>> advice should be to use different variables.

[Nick]
> I can use that *exact same argument* to justify the Python 2 comprehension
> variable leaking behaviour. We decided that was a bad idea based on ~18
> years of experience with it,

Here's the practical difference:  you can't write a listcomp or genexp
AT ALL without a "for" clause, so whether "for" target names leak is
an issue in virtually every listcomp or genexp ever written.  Here's
one where it isn't:

[None for somelist[12] in range(10)]

Which nobody has ever seen in real life ;-)

But ":=" is never required to write one - you only use it when you go
out of your way to use it.  I expect that will be relatively rare in
real life listcomps and genexps.


> and there hasn't been a clear justification presented for going back on that
> decision

Nobody is suggesting going back on "all and only `for` target names
are local to the genexp/listcomp".  To the contrary,  the proposal
preserves that verbatim:  It's not _adding_ "oh, ya, and binding
operator targets are local too".  Just about everything here follows
from _not_ adding that.


> presented beyond "Tim would like using it sometimes".

So long as I'm the only one looking at real-life use cases, mine is
the only evidence I care about ;-)  I don't really care about
contrived examples, unless they illustrate that a proposal is
ill-defined, impossible to implement as intended, or likely to have
malignant unintended consequences out-weighing their benefits.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-10 Thread Tim Peters
[Tim]
>> ...
>> So long as I'm the only one looking at real-life use cases, mine is
>> the only evidence I care about ;-)  I don't really care about
>> contrived examples, unless they illustrate that a proposal is
>> ill-defined, impossible to implement as intended, or likely to have
>> malignant unintended consequences out-weighing their benefits.

[Brendan Barnwell ]
> You keep saying things like this with a smiley, and I realize you
> know what you're talking about (much more than I do), but I'd just like to
> push back a bit against that entire concept.

I'm not so keen on meta-discussions either ;-)


> Number one, I think many people have been bringing in real life uses
> cases.

Keep in mind the context here:  _this_ thread is specifically about
listcomps and genexps.  I agree there have been tons of use cases
presented for statement-oriented applications (some positive for the
feature, some negative), but not so much for listcomps and genexps.

It's worth noting again that "the" use case that started all this long
ago was a listcomp that the current PEP points out still "won't work":

total = 0
progressive_sums = [total := total + value for value in data]

It's obvious what that's intended to do.  It's not obvious why it
blows up.  It's a question of scope, and the scopes of names in
synthesized functions is a thoroughly legitimate thing to question.
The suggestion made in the first message of this thread was the
obvious scope change needed to make that example work, although I was
motivated by looking at _other_ listcomp/genexp use cases.  They
wanted the same scope decision as the example above.  But I didn't
realize that the example above was essentially the same thing until
after I made the suggestion.


>  Number two, I disagree with the idea that looking at individual use
> cases and ignoring logical argumentation is the way to go.

Fine, then you argue, and I'll look at use cases ;-)  Seriously, I
don't at all ignore argument - but, yes, arguments are secondary to
me.  I don't give a rip about how elegant something is if it turns out
to be unusable.  Conversely, I don't _much_ care about how "usable"
something is if the mental model for how it works is inexplicable.


> The problem with it is that a lot of the thorny issues arise in
> unanticipated interactions between constructs that were
> designed to handle separate use cases.

Sure.


> I also do not think it's appropriate to say "if it turns out there's
> a weird interaction between two features, then just don't use those two
> things together".

Sometimes it is, sometimes it isn't.  For example, code using threads
has to be aware of literal mountains of other features that may not
work well (or at all) in a multi-threaded environment without major
rewriting.  Something as simple as "count += 1" may fail in mysterious
ways otherwise.  So it goes.  But note that this is easily
demonstrated by realistic code.


> One of the great things about Python's design is that it doesn't just
> make it easy for us to write good code, but in many ways makes
> it difficult for us to write bad code.

That one I disagree with.  It's very easy to write bad code in every
language I'm aware of.  It's just that Python programmers are too
enlightened to approve of doing so ;-)


>  It is absolutely a good idea to think of the broad range of wacky things that
> COULD be done with a feature,

So present some!

> not just the small range of things in the focal area of its intended use.
> We may indeed decide that some of the wacky cases are so unlikely that we're
> willing to accept them, but we can only decide that after we consider them.
> You seem to be suggesting that we shouldn't even bother thinking about such
> corner cases at all, which I think is a dangerous mistake.

To the contrary, bring 'em on.  But there is no feature in Python you
can't make "look bad" by contriving examples, from two-page regular
expressions to `if` statements nested 16 deep.  "But no sane person
would do that" is usually - but not always - "refutation" enough for
such stuff.


> Taking the approach of "this individual use case justifies this
> individual feature", leads to things like JavaScript, a hellhole of special
> cases, unintended consequences, and incoherence between different corners of
> the language.  There are real cognitive benefits to having language features
> make logical and conceptual sense IN ADDITION TO having practical utility,
> and fit together into a unified whole.

I haven't ignored that here.  The scope rule for synthesized functions
implementing regexps and listcomps _today_ is:

 The names local to that function are the names appearing as `for` targets.
 All other names resolve to the same scopes they resolve to in the block
 containing the synthesized function.

The scope rule if the suggestion is adopted?  The same, along with
that a name appearing as a ":=" target 

Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-12 Thread Tim Peters
[Tim]
> ...
> But an assignment expression target name always has the
> same scope within a comprehension.  [which is a consequence
> of the rules - not a rule of its own]

Something related to ponder:  what's the meaning of the following
_without_ the proposed scope change?  So the Golden Binding Rule (GBR)
applies then:

GBR:  binding a name by any means always makes the name
local to the block the binding appears in, unless the name is
declared "global" or "nonlocal" in the block.

def f():
ys =  [y for _ in range(y := 5)]

The second instance of `y` is local - but local to what?  Since the
range is evaluated _in_ f's scope, presumably that instance of `y` is
local to `f`.  What about the first instance of `y`?  Is that _not_
local to the comprehension despite that the GBR insists it must be
local to the comprehension?

Or does it raise UnboundLocalError for consistency with the GBR, and
"well, so just don't use any name in a comprehension that appears as
an assignment expression target in the expression defining the
iterable for the outermost `for` ".

Or is it that despite that `range(y := 5)` is executed in f's scope,
the _binding_ is actually performed in the comprehension's scope to a
comprehension-local `y`, to both preserve GBR and avoid the
UnboundLocalError? .  But then what if `print(y)` is added after?  If
`range(y := 5)` really was executed in f's scope, surely that must
print 5.

Then what about

[y for y in range(y := 5)]

?  Now that there's another binding inside the comprehension
establishing that `y` is local to the comprehension "for real", does
that work fine and the rule changes to

well, so just don't use any name in a comprehension
that appears as an assignment expression target in the
expression E defining the iterable for the outermost `for`
- unless the name is _also_ used in a binding context
in the comprehension outside of E too

?  Or is that a compile-time error despite that the first 2 y's are
now obviously comprehension-local and the final y obviously f-local?

Or are assignment expressions disallowed in the expression defining
the iterable for the outermost `for`, and both examples are
compile-time errors?

Talk about incoherent ;-)

Under the proposed change,  all instances of `y` are local to `f` in
the first example, and the second example is a compile-time error for
a _coherent_ reason (the ":=" binding implies "not local" for `y` -
which has nothing to do with that it's in the outermost `for` -, but
the "for y in" binding implies "local" for `y`).
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-11 Thread Tim Peters
[Tim[
>> ...
>> ":=" target names in a genexp/listcmp are treated exactly the same as
>> any other non-for-target name:  they resolve to the same scope as they
>> resolve to in the block that contains them.  The only twist is that if
>> such a name `x` isn't otherwise known in the block, then `x` is
>> established as being local to the block (which incidentally also
>> covers the case when the genexp/listcomp is at module level, where
>> "local to the block" and "global to the block" mean the same thing).
>> Class scope may be an exception (I cheerfully never learned anything
>> about how class scope works, because I don't write insane code ;-) ).

[Nick]
> That's all well and good, but it is *completely insufficient for the
> language specification*.

I haven't been trying to write reference docs here, but so far as
supplying a rigorous specification goes, I maintain the above gets
"pretty close".  It needs more words, and certainly isn't in the
_style_ of Python's current reference docs, but that's all repairable.
Don't dismiss it just because it's brief.  Comprehensions already
exist in the language, and so do nested scopes, so it's not necessary
for this PEP to repeat any of the stuff that goes into those.  Mostly
it needs to specify the scopes of assignment expression target names -
and the _intent_ here is really quite simple.

Here with more words, restricted to the case of assignment expressions
in comprehensions (the only case with any subtleties):

Consider a name `y` appearing in the top level of a comprehension as
an assignment expression target, where the comprehension is
immediately contained in scope C, and the names belonging to scopes
containing C have already been determined:

... (y := expression) ...

We can ignore that `y` also appears as a `for` target at the
comprehension's top level, because it was already decided that's a
compile-time error.

Consider what the scope of `y` would be if `(y := expression)` were
textually replaced by `(y)`.  Then what would the scope of `y` be?
The answer relies solely on what the docs _already_ specify.  There
are three possible answers:

1. The docs say `y` belongs to scope S (which may be C itself, or a
scope containing C).  Then y's scope in the original comprehension is
S.

2. The docs say name `y` is unknown.  Then y's scope in the original
comprehension is C.

3. The docs are unclear about whether #1 or #2 applies.  Then the
language is _already_ ill-defined.

It doesn't matter to this whether the assignment expression is, or is
not, in the expression that defines the iterable for the outermost
`for`.

What about that is hand-wavy?  Defining semantics clearly and
unambiguously doesn't require specifying a concrete implementation
(the latter is one possible way to achieve the goal - but _here_ it's
a convoluted PITA because Python has no way to explicitly declare
intended scopes).  Since all questions about scope are reduced by the
above to questions about Python's _current_ scope rules, it's as clear
and unambiguous as Python's current scope rules.

Now those may not be the _intended_ rules in all cases.  That deserves
deep scrutiny.  But claiming it's too vague to scrutinize doesn't fly
with me.  If there's a scope question you suspect can't be answered by
the above, or that the above gives an unintended answer to, by all
means bring that up!  If your question isn't about scope, then I'd
probably view it as being irrelevant to the current PEP (e.g., what
`locals()` returns depends on how the relevant code object attributes
are set, which are in turn determined by which scopes names belong to
relative to the code block's local scope, and it's certainly not
_this_ PEP's job to redefine what `locals()` does with that info).

Something to note:  for-target names appearing in the outermost `for`
_may_ have different scopes in different parts of the comprehension.

y = 12
[y for y in range(y)]

There the first two `y`'s have scope local to the comprehension, but
the last `y` is local to the containing block.  But an assignment
expression target name always has the same scope within a
comprehension.  In that specific sense, their scope rules are "more
elegant" than for-target names.  This isn't a new rule, but a logical
consequence of the scope-determining algorithm given above.  It's a
_conceptual_ consequence of that assignment statement targets are
"intended to act like" the bindings are performed _in_ scope C rather
than in the comprehension's scope.  And that's no conceptually weirder
than that it's _already_ the case that the expression defining the
iterable of the outermost `for` _is_ evaluated in scope C (which I'm
not a fan of, but which is rhetorically convenient to mention here ;-)
).

As I've said more than once already, I don't know whether this should
apply to comprehensions at class scope too - I've never used a
comprehension in class scope, and doubt I ever will.  Without use
cases I'm familiar with, I have no idea what 

Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-12 Thread Tim Peters
Ah, fudge - I pasted in the wrong "high-level" code.  Sorry!  The code
that's actually being emulated is not

> list(i + sum((i := i+1) + i for j in range(i))
>   for i in range(5))

but

 list(i + sum((i := i+1) for j in range(i)) + i
  for i in range(5))

> ...

I have piles of these, but they're all equally tedious so I'll stop
with this one ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-12 Thread Tim Peters
[attributions lost - sorry, but I can't get 'em back]
...

>>> Similar to import statements, optional parentheses could be included in
>>> the
>>> grammar, allowing the name bindings to be split across multiple lines:
>>>
>>>if diff and g > 1 given (
>>>diff = x - x_base,
>>>g = gcd(diff, n),
>>>):
>>>return g

>> I'm well behind, but... this! This turns "given" into a +0.8 for me.
>>
>> That's really nice. It reads clearly too.
>>

> That's longer than this:
>
> diff = x - x_base
> g = gcd(diff, n)
> if diff and g > 1:
> return g
>
> which is already valid.

Since that was my example to begin with, I think it's fair to point
out that they all miss a key part of the original example:  this code
is working with multi-thousand bit integers, and calling gcd() is
expensive.  It was a key point that

if (diff := x - x_base) and (g := gcd(diff, n)) > 1:
 return g

didn't call gcd() _at all_ unless `diff` was non-zero.  The original
real-life code was:

diff = x - x_base
if diff:
g = gcd(diff, n)
if g > 1:
return g
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-12 Thread Tim Peters
[Nick]
>> That's all well and good, but it is *completely insufficient for the
>> language specification*.

> I haven't been trying to write reference docs here, but so far as
> supplying a rigorous specification goes, I maintain the above gets
> "pretty close".  It needs more words, and certainly isn't in the
> _style_ of Python's current reference docs, but that's all repairable.
> Don't dismiss it just because it's brief.  Comprehensions already
> exist in the language, and so do nested scopes, so it's not necessary
> for this PEP to repeat any of the stuff that goes into those.  Mostly
> it needs to specify the scopes of assignment expression target names -
> and the _intent_ here is really quite simple.
>
> Here with more words, restricted to the case of assignment expressions
> in comprehensions (the only case with any subtleties):
> ...

And if you didn't like those words, you're _really_ gonna hate this
;-)  I don't believe more than just the following is actually
necessary, although much more than this would be helpful.  I spent the
first 15-plus years of my career writing compilers for a living, so am
sadly resigned to the "say the minimum necessary for a long argument
to conclude that it really was the minimum necessary" style of
language specs.  That's why I exult in giving explanations and
examples that might actually be illuminating - it's not because I
_can't_ be cryptically terse ;-)

Section 4.2.1 (Binding of names) of the Language Reference Manual has
a paragraph starting with "The following constructs bind names:".  It
really only needs another two-sentence paragraph after that to capture
all of the PEP's intended scope semantics (including my suggestion):

"""
An assignment expression binds the target, except in a function F
synthesized to implement a list comprehension or generator expression
(see XXX).  In the latter case, if the target is not in F's
environment (see section 4.2.2) , the target is bound in the block
containing F.
"""

That explicitly restates my earlier "rule #2" in the language already
used by the manual.  My "rule #1" essentially vanishes as such,
because it's subsumed by what the manual already means by "F's
environment".

This may also be the best place to add another new sentence.:

"""
Regardless, if the target also appears as an identifier target of a
`for` loop header in F, a `SyntaxError` exception is raised.
"""

Earlier, for now-necessary disambiguation, I expect that in

... targets that are identifiers if occurring in an assignment, ...

" statement" should be inserted before the comma.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-13 Thread Tim Peters
[Tim, suggests changes to the Reference Manual's 4.2.1]
> """
> An assignment expression binds the target, except in a function F
> synthesized to implement a list comprehension or generator expression
> (see XXX).  In the latter case, if the target is not in F's
> environment (see section 4.2.2) , the target is bound in the block
> containing F.
> """

Let me try that again ;-)  The notion of "environment" includes the
global scope, but that's not really wanted here.   "Environment" has
more of a runtime flavor anyway.  And since nobody will tell me
anything about class scope, I read the docs myself ;-)

And that's a problem, I think!  If a comprehension C is in class scope
S, apparently the class locals are _not_ in C's environment.  Since C
doesn't even have read access to S's locals, it seems to me bizarre
that ":=" could _create_ a local in S.

Since I personally couldn't care less about running comprehensions of
any kind at class scope, I propose to make `:=` a SyntaxError if
someone tries to use a comprehension with ':=' at class scope (of
course they may be able to use ":=" in nested comprehensions anyway -
not that anyone would).

If someone objects to that, fine, you figure it out ;-)  So here's another stab.

"""
An assignment expression binds the target, except in a function F
synthesized to implement a list comprehension or generator expression
(see XXX).  In the latter case[1]:

- If the target also appears as an identifier target of a `for` loop
header in F, a `SyntaxError` exception is raised.

- If the block containing F is a class block, a `SyntaxError`
exception is raised.

- If the target is not local to any function enclosing F, and is not
declared `global` in the block containing F, then the target is bound
in the block containing F.

Footnote:
[1] The intent is that runtime binding of the target occurs as if the
binding were performed in the block containing F.  Because that
necessarily makes the target not local in F, it's an error if the
target also appears in a `for` loop header, which is a local binding
for the same target.  If the containing block is a class block, F has
no access to that block's scope, so it doesn't make sense to consider
the containing block.  If the target is already known to the
containing block, the target inherits its scope resolution from the
containing block.  Else the target is established as local to the
containing block.
"""

I realize the docs don't generally use bullet lists.  Convert to
WallOfText if you must.  The material in the footnote would usually go
in a "Rationale" doc instead, but we don't have one of those, and I
think the intent is too hard to deduce without that info.


And repeating the other point, to keep a self-contained account:

> Earlier, for now-necessary disambiguation, I expect that in
>
> ... targets that are identifiers if occurring in an assignment, ...
>
> " statement" should be inserted before the comma.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-13 Thread Tim Peters
[Tim[
>> ... If a comprehension C is in class scope S, apparently the
>> class locals are _not_ in C's environment.  Since C doesn't
>>> even have read access to S's locals, it seems to me bizarre
>> that ":=" could _create_ a local in S.

[Ethan Furman ]
> Python 3.7.0b3+ (heads/bpo-33217-dirty:28c1790, Apr  5 2018, 13:10:10)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> --> class C:
> ...   huh = 7
> ...   hah = [i for i in range(huh)]
> ...
> --> C.hah
> [0, 1, 2, 3, 4, 5, 6]
>
> Same results clear back to 3.3 (the oldest version of 3 I have).  Are the
> docs wrong?
>
> Or maybe they just refer to functions:
>
> --> class C:
> ...   huh = 7
> ...   hah = [i for i in range(huh)]
> ...   heh = lambda: [i for i in range(huh)]
> ...
> --> C.hah
> [0, 1, 2, 3, 4, 5, 6]
> --> C.heh()
> Traceback (most recent call last):
>   File "test_class_comp.py", line 7, in 
> print(C.heh())
>   File "test_class_comp.py", line 4, in 
> heh = lambda: [i for i in range(huh)]
> NameError: global name 'huh' is not defined

As Chris already explained (thanks!), the expression defining the
iterable for the outermost `for` (which, perhaps confusingly, is the
_leftmost_ `for`) is treated specially in a comprehension (or genexp),
evaluated at once _in_ the scope containing the comprehension, not in
the comprehension's own scope.  Everything else in the comprehension
is evaluated in the comprehension's scope.

I just want to add that it's really the same thing as your lambda
example.  Comprehensions are also implemented as lambdas (functions),
but invisible functions created by magic.  The synthesized function
takes one argument, which is the expression defining the iterable for
the outermost `for`.  So, skipping irrelevant-to-the-point details,
your original example is more like:

class C:
huh = 7

def _magic(it):
return [i for i in it]

hah = _magic(range(huh))

Since the `range(huh)` part is evaluated _in_ C's scope, no problem.
For a case that blows up, as Chris did you can add another `for` as
"outermost", or just try to reference a class local in the body of the
comprehension:

class C2:
huh = 7
hah = [huh for i in range(5)]

That blows up (NameError on `huh`) for the same reason your lambda
example blows up, because it's implemented like:

class C:
huh = 7

def _magic(it):
return [huh for i in it]

hah = _magic(range(5))

and C's locals are not in the environment seen by any function called
from C's scope.

A primary intent of the proposed ":= in comprehensions" change is that
you _don't_ have to learn this much about implementation cruft to
guess what a comprehension will do when it contains an assignment
expression.  The intent of

total = 0
sums = [total := total + value for value in data]

is obvious - until you think too much about it ;-)  Because there's no
function in sight, there's no reason to guess that the `total` in
`total = 0` has nothing to do with the instances of `total` inside the
comprehension.  The point of the change is to make them all refer to
the same thing, as they already do in (the syntactically similar, but
useless):

total = 0
sums = [total == total + value for value in data]

Except even _that_ doesn't work "as visually expected" in class scope
today.  The `total` inside the comprehension refers to the closest (if
any) scope (_containing_ the `class` statement) in which `total` is
local (usually the module scope, but may be a function scope if the
`class` is inside nested functions).

In function and module scopes, the second `total` example does work in
"the obvious" way, so in those scopes I'd like to see the first
`total` example do so too.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-11 Thread Tim Peters
[Matt Arcidy]
>> Note Tim came up with a real metric:
>> 2 * count(":=")/len(statement).
>> It's objective.  it's just unclear if a higher score is better or worse.
>> However, one could say "a Tim of .3 is considered too high" as a guideline.

[Steven D'Aprano]
> I think Tim was making a joke about demanding objective measurements of
> subjective things.
>
> Certainly he hasn't done any research or study to justify that metric.
> He just plucked the formula out of thin air.

It was the outcome of an intense 17-year research project.


> Or at least no peer reviewed research.

Au contraire!  My peers are here, and that message was reviewed by at
least 3 people on this list.

That said, I am a fan of objectively measuring subjective things, just
not of taking the measurements seriously ;-)

If people do want to take it seriously, check out prior Python art first:

http://radon.readthedocs.io/en/latest/intro.html
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-13 Thread Tim Peters
[Tim]
>> - If the target is not local to any function enclosing F, and is not
>> declared `global` in the block containing F, then the target is bound
>> in the block containing F.

[also Tim]
> FYI, that's still not right, ...
> I suspect that the above should be reworded to the simpler:
>
> - If the target is not  declared `global` or `nonlocal` in the block
>   containing F, then the target is bound in the block containing F.
> ...

I'm satisfied that captures the intent - but now it's misleadingly
wordy.  It should be the briefer:

- The target is bound in the block containing F.

Other text (in section 4.2.2) already covers the intended meanings for
when a `global` or `nonlocal` declaration appears in the block too.

And then it's short enough again that the bullet list isn't really
helpful anymore.  So, putting that all together:

"""
An assignment expression binds the target, except in a function F
synthesized to implement a list comprehension or generator expression
(see XXX).  In the latter case[1], the target is bound in the block
containing F, and errors may be detected:  If the target also appears
as an identifier target of a `for` loop header in F, a `SyntaxError`
exception is raised.  If the block containing F is a class block, a
`SyntaxError` exception is raised.

Footnote:
[1] The intent is that runtime binding of the target occurs as if the
binding were performed in the block containing F.  Because that
necessarily makes the target not local in F, it's an error if the
target also appears in a `for` loop header, which is a local binding
for the same target.  If the containing block is a class block, F has
no access to that block's scope, so it doesn't make sense to consider
the containing block.  The target is bound in the containing block,
where it inherits that block's `global` or `nonlocal` declaration if
one exists, else establishes that the target is local to that block.
"""

Along with the previous:

> ... targets that are identifiers if occurring in an assignment, ...
>
> " statement" should be inserted before the comma.


Of course there are other docs that need to change too.  I viewed this
as the hardest part (witness how long it took to reduce it to
near-triviality ;-) ), so wanted to get it out of the way first.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-13 Thread Tim Peters
[Tim]
> ...
> - If the target is not local to any function enclosing F, and is not
> declared `global` in the block containing F, then the target is bound
> in the block containing F.

FYI, that's still not right, but I've been distracted by trying to
convince myself that the manual actually defines what happens when
absurdly deeply nested functions mix local values for a name at some
levels with a `global` declaration of the name at other levels.

I suspect that the above should be reworded to the simpler:

- If the target is not  declared `global` or `nonlocal` in the block
  containing F, then the target is bound in the block containing F.

That makes "intuitive sense" because if the target is declared
`global` or `nonlocal` the meaning of binding in the block is already
defined to affect a not-local scope, while if it's not declared at all
then binding in the block "should" establish that it's local.to the
block (regardless of how containing scopes treat the same name)

But whether that all follows from what the manual already says
requires more staring at it ;-)

Regardless, if anyone were to point it out, I'd agree that it _should_
count against this that establishing which names are local to a block
may require searching top-level comprehensions in the block for
assignment expressions.  On a scale of minus a million to plus a
million, I'd only weight that in the negative thousands, though ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-11 Thread Tim Peters
[Tim]
>> Since this is all about scope, while I'm not 100% sure of what Guido
>> meant, I assumed he was saying "p can only have one scope in the
>> synthetic function:  local or non-local, not both, and local is what I
>> propose".  For example, let's flesh out his example a bit more:
>>
>> p = 42
>> [p := p for p in range(10) if p == 3]
>> print(p) # 42?  3?  9?
>>
>> If `p` is local to the listcomp, it must print 42.  If `p` is
>> not-local, it must print 9.  If it's some weird mixture of both, 3
>> makes most sense (the only time `p := p` is executed is when the `for`
>> target `p` is 3).

[Jacco van Dorp ]
> With my limited experience, I'd consider 3 to make most sense, but 9
> when thinking about it in the expanded form.
>
> If it's not 3 tho, then the following would make most sense:
>
> SyntaxError("Cannot re-bind for target name in a list comprehension")
> # Or something more clear.
>
> And the rest of that mail that convinces me even more that an error
> would be the correct solution here.

Good news, then:  Nick & Guido recently agreed that it would be a
compile-time error.  Assuming it's added to the language at all, of
course.


> Before I got on this mailinglist, i never even knew comprehensions
> introduced a new scope. I'm really that new.

They didn't, at first.  That changed over time.  The real reason was
so that `for` variables - which people typically give little thought
to naming - didn't accidentally overwrite local variables that
happened to share the same name.  Like:

>>> i = -42
>>> [i+1 for i in range(3)]
[1, 2, 3]
>>> i  # unchanged!
-42

But you can productively use list comprehensions without knowing
anything about how they're implemented, and just think "ha!  Python
does some happy magic for me there :-)".


> Two years ago I'd look up stackoverflow to check the difference between
> overriding and extending a method and to verify whether I made my
> super() calls the right way.
>
> If something goes to weird, I think just throwing exceptions is a
> sensible solution that keeps the language simple, rather than making
> that much of a headache of something so trivially avoided.

Since Guido agreed with you in this case, that proves you're a true
Pythonista - or maybe just that you're both Dutch ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 572: about the operator precedence of :=

2018-05-11 Thread Tim Peters
[Guido]
> ,,,
> OT about the name: despite Tim's relentless pushing of "binding expressions"
> in the end I think they should be called "assignment expressions" just like
> in C.

Ha!  I already gave up on "binding expressions".  For nearly a full
day, I've been rigidly calling them "binding operators" instead.
That's why everyone has warmed up to them so dramatically since
yesterday.

Oh.  They haven't?  Oh well - since verbal misdirection failed, I'll
go back to "assignment expressions'.  Chris always liked that better
anyway too.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-10 Thread Tim Peters
...

[Guido]
>> You should really read Tim's initial post in this thread, where he
>> explains his motivation.

[Nick]
> I did, and then I talked him out of it by pointing out how confusing it
> would be to have the binding semantics of "x := y" be context dependent.

Ya, that was an effective Jedi mind trick when I was overdue to go to sleep ;-)

To a plain user, there's nothing about a listcomp or genexp that says
"new function introduced here".  It looks like, for all the world,
that it's running _in_ the block that contains it.  It's magical
enough that `for` targets magically become local.  But that's almost
never harmful magic, and often helpful, so worth it.


>...
> It *is* different, because ":=" normally binds the same as any other name
> binding operation including "for x in y:" (i.e. it creates a local
> variable), while at comprehension scope, the proposal has now become for "x
> := y" to create a local variable in the containing scope, while "for x in y"
> doesn't.

":=" target names in a genexp/listcmp are treated exactly the same as
any other non-for-target name:  they resolve to the same scope as they
resolve to in the block that contains them.  The only twist is that if
such a name `x` isn't otherwise known in the block, then `x` is
established as being local to the block (which incidentally also
covers the case when the genexp/listcomp is at module level, where
"local to the block" and "global to the block" mean the same thing).
Class scope may be an exception (I cheerfully never learned anything
about how class scope works, because I don't write insane code ;-) ).


> Comprehension scoping is already hard to explain when its just a
> regular nested function that accepts a single argument, so I'm not looking
> forward to having to explain that "x := y" implies "nonlocal x" at
> comprehension scope

It doesn't, necessarily.  If `x` is already known as `global` in the
block, then there's an implied `global x` at comprehension scope.

> (except that unlike a regular nonlocal declaration, it also implicitly makes 
> it a local
> in the immediately surrounding scope).

Only if `x` is otherwise _unknown_ in the block.  If, e.g., `x` is
already known in an enclosing scope E, then `x` also resolves to scope
E in the comprehension.  It is not made local to the enclosing scope
in that case.

I think it's more fruitful to explain the semantics than try to
explain a concrete implementation.  Python's has a "lumpy" scope
system now, with hard breaks among global scopes, class scopes, and
all other lexical scopes.  That makes implementations artificially
noisy to specify.  "resolve to the same scope as they resolve to in
the block that contains them, with a twist ..." avoids that noise
(e.g.,  the words "global" and "nonlocal" don't even occur), and gets
directly to the point:  in which scope does a name live?  If you think
it's already clear enough which  scope `y` resolves to in

z = (x+y for x in range(10))

then it's exactly as clear which scope `y` resolves to in

z = (x + (y := 7) for x in range(10))

with the twist that if `y` is otherwise unknown in the containing
block, `y` becomes local to the block.


> It isn't reasonable to wave this away as "It's only confusing to Nick
> because he's intimately familiar with how comprehensions are implemented",

As above, though, I'm gently suggesting that being so intimately
familiar with implementation details may be interfering with seeing
how all those details can _obscure_ rather than illuminate.  Whenever
you think you need to distinguish between, e.g., "nonlocal" and
"global", you're too deep in the detail weeds.


> as I also wrote some of the language reference docs for the current (already
> complicated) comprehension scoping semantics, and I can't figure out how
> we're going to document the proposed semantics in a way that will actually
> be reasonably easy for readers to follow.

Where are those docs?  I expect to find such stuff in section 4
("Execution model") of the Language Reference Manual, but listcomps
and genexps are only mentioned in passing once in the 3.6.5 section 4
docs, just noting that they don't always play well at class scope.


> ...
> - for comprehensions at class scope, the class scope is ignored for purposes
> of determining the target binding scope (and hence will implicitly create a
> new global variable when used in a top level class definition, and new
> function local when used in a class definition nested inside a function)

Isn't all of that too covered by "resolve to the same scope as they
resolve to in the block that contains them .."?  For example, in

class K:
print(g)

at module level, `g` obviously refers to the global `g`.  Therefore
any `g` appearing as a ";=" target in an immediately contained
comprehension also refers to the global `g`, exactly the same as if
`g` were any other non-for-target name in the comprehension.   That's
not a new rule:  it's a consequence of how class scopes 

Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-11 Thread Tim Peters
[Brendan Barnwell]
>>
>> . . . and it's true the latter is a bit more verbose in that case for
>> little extra benefit.  But when the locally-defined value is used within
>> a more complicated expression (like the quadratic formula example), I
>> think readability goes down significantly.  To appease Tim, instead of
>> using the quadratic formula, though, I will use a more realistic example
>> that comes up fairly often for me: wanting to do some kind of
>> normalization on a piece of data for a comparison, while keeping the
>> unnormalized data for use within the block:
>>
>> if some_condition and (stuff:=
>> get_user_input()).lower().strip().replace('-', ''):
>>
>> versus
>>
>> if some_condition and stuff.lower().strip().replace('-', '') given
>> stuff = get_user_input():

[also Brendan]
> Ironically I weakened my argument by forgetting to finish my
> expression there.  I intended that chain of method calls to be used in a
> comparison to make the surrounding expression more complex.  So revise the
> above to
>
> if some_condition and (stuff :=
> get_user_input()).lower().strip().replace('-', '') == existing_value:
>
> versus
>
> if some_condition and stuff.lower().strip().replace('-', '') ==
> existing_value given stuff = get_user_input():

Even more ironically, to my eyes the original more strongly supported
your view than the rewrite ;-)

"given stuff =" stuck out in the original because it was preceded by
punctuation (a right parenthesis).

I had to read the rewrite 3 times before i realized you were even
_using_ "given", because there it's buried between two other names -
"existing_value given stuff" -  and visually looks more like it's
actually the 3rd of 4 words (because of the underscore in
"existing_value").

Of course that would have been obvious in a Python-aware editor that
colored "given" differently, but as-is I found the original easy to
read but the rewrite a puzzle to decode.

Similarly, in the rewritten assignment expression spelling, it's
obvious at a glance that the test is of the form

some_condition and  some_messy_expression == existing_value

but in the rewritten "given" sample that's obscured because
"existing_value" not only doesn't end the statement, it's not even
followed by punctuation.  Of course coloring "given" differently would
remove that visual uncertainty too.  For a dumb display, I'd write it

   if some_condition and (
   stuff.lower().strip().replace('-', '') == existing_value) given
stuff = get_user_input():

instead (added parens so that "existing_value" and "given" are
separated by punctuation).
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-11 Thread Tim Peters
[Gustavo Carneiro]
>>> IMHO, all these toy examples don't translate well to the real world
>>> because they tend to use very short variable names while in real world [good
>>> written code] tends to select longer more descriptive variable names.

[Greg Ewing]
>> I don't believe that's always true. It depends on the context.
>> Sometimes, using long variable names can make code *harder*
>> to read.
>>
>> I don't think there's anything unrealistic about this
>> example:
>>
>>if m given m = pattern.match(the_string):
>>   nugget = m.group(2)
>>
>> Most people's short-term memory is good enough to remember
>> that "m" refers to the match object while they read the
>> next couple of lines. IMO, using a longer name would serve
>> no purpose and would just clutter things up.

[Nick Coghlan]
> I've been thinking about this problem, and I think for the If/elif/while
> cases it's actually possible to allow the "binding is the same as the
> condition" case to be simplified to:
>
> if command =  pattern.match(the_string):
> ...
> elif command =  other_pattern.match(the_string):
> ...
>
> while data = read_data():

Unless there's some weird font problem on my machine, that looks like
a single "equals sign".  In which case we'd be reproducing C's
miserable confusion about whether:

if (i = 1)

was a too-hastily-typed spelling of the intended:

if (i == 1)

or whether they were thinking "equals" and typed "=" by mistake.

If so, that would get an instant -1 from any number of core devs, who
have vivid painful memories of being burned by that in C.  That's not
just speculation - it came up a number of times in the PEP 572
threads.


> Allowing this would be part of the definition of the if/elif/while statement
> headers, rather than a general purpose assignment expression.
>
> The restriction of the LHS to a simple name target would need to be in the
> AST generator rather than in the grammar, but it's hardly the only case
> where we do that kind of thing. Switching to the given expression form would
> then only be necessary in cases where the condition *wasn't* the same as the
> binding target.
>
> A similar enhancement could be made to conditional expressions (adjusting
> their grammar to permit "EXPR if NAME = EXPR else EXPR") and filter clauses
> in comprehensions (allowing "EXPR for TARGET in EXPR if NAME = EXPR"). In
> essence, "if", "elif", and "while" would all allow for an "implied given"
> clause in order to simplify the 90% case where the desired condition and the
> bound expression are the same.

Spell it ":=" (colon equals) instead, and a few core devs would stop
objecting ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-07 Thread Tim Peters
[Guido]
> I am convinced by Tim's motivation. I hadn't thought of this use case before
> -- I had mostly thought "local scope in a comprehension or generator
> expression is the locals of the synthetic function". But Tim's reasoning
> feels right.

I'm trying very hard _not_ to reason.  That is, I'm looking at code
and trying to figure out what would actually work well, logic &
consistency & preconceptions be damned.  "Reasons" can be made up
after the fact - which is how the world actually works regardless ;-)


> The only solution that makes sense to me is Steven's (2). (1) is what the
> PEP currently says and what Tim doesn't want; (3) has no precedent (function
> defaults don't really work like this) and just gets my hackles all up. (I
> can't even tell if Steven meant it as a serious proposal.)

He doesn't want (3) either.  I can channel him on that.


> So let's see if we can change PEP 572 so that := inside a comprehension or
> generator expression always assigns to a variable in the containing scope.

While I don't have real use cases beyond that, given that much,
"consistency" kicks in to suggest that:

def f():
 [x := 42 for x in range(1)]

makes `x` local to `f` despite that x wasn't bound elsewhere in f's body.

def f():
global x
 [x := 42 for x in range(1)]

binds the global `x`.

def f():
nonlocal x
 [x := 42 for x in range(1)]

binds `x` in the closest-containing scope in which `x` is local.  The
last two act as if the declaration of `x` in `f` were duplicated at
the start of the synthetic function.

More succinctly, that `x := RHS` in a synthetic function "act the
same" as `x = RHS` appearing in the scope directly containing the
synthetic function.

Does that generalize to class scope too?  I don't know.  I never use
fancy stuff in class scopes, and have no idea how they work anymore.
So long as "def method(self, ...)" continues to work, I'm happy ;-)


> It may be inconsistent with the scope of the loop control variable, but it's
> consistent with uses of := outside comprehensions:
>
>   [x := 0]
>   [x := i for i in range(1)]
>
> both have the side effect of setting x to zero. I like that.

And is what anyone would expect if they didn't think too much about it.

... [snipping stuff about class scope - nothing to add] ...


> PS1. The various proposals that add random extra keywords to the syntax
> (like 'for nonlocal i') don't appeal to me at all.

They're appealing to the extent that "explicit is better than
implicit" for people who actually understand how all this stuff is
implemented.  I don't know what percentage of Python programmers that
is, though.  I've certainly, e.g., seen many on Stackoverflow who can
make a list comprehension work who couldn't distinguish a lexical
scope from an avocado.  The ":= is treated specially in
comprehensions" idea is aimed more at them than at people who think
invoking a synthetic anonymous lambda "is obvious".


> PS2. IIRC the reason we gave loop control variables their own scope was the
> poor imagination of many people when it comes to choosing loop control
> variable names. We had seen just too many examples of
>
>   for x in something:
>   ...lots of code using x...
>   blah blah [x+1 for x in something else]
>   ...some more code using x, broken...
>
> It's true that this can also happen with a for-loop statement nested inside
> the outer loop (and it does) but the case of a comprehension was easier to
> miss. I've never looked back.

I don't want to change anything about any of that - already believe
Python struck the best balance possible.


> PS3. Comprehensions and generator expressions should be interchangeable.
> They just look too similar to have different semantics (and the possibly
> delayed execution of generator expressions is not an issue -- it's rare, and
> closure semantics make it work just fine).

Wholly agreed.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-07 Thread Tim Peters
[Tim]
>> In a different thread I noted that I sometimes want to write code like
>> this:
>>
>>  while any(n % p == 0 for p in small_primes):
>>  # divide p out - but what's p?
>>
>> But generator expressions hide the value of `p` that succeeded, so I
>> can't.  `any()` and `all()` can't address this themselves - they
>> merely work with an iterable of objects to evaluate for truthiness,
>> and know nothing about how they're computed.  If you want to identify
>> a witness (for `any()` succeeding) or a counterexample (for `all()`
>> failing), you need to write a workalike loop by hand.

[Brendan Barnwell ]
 > I agree that is a limitation, and I see from a later message in the
> thread that Guido finds it compelling, but personally I don't find that that
> particular case such a showstopper that it would tip the scales for me
> either way.  If you have to write the workalike look that iterates and
> returns the missing value, so be it.  That's not a big deal.

Guido didn't find it compelling:  for that specific example to show
`p` would require that for-loop targets "leak", and he remains opposed
to that.  I don't want that changed either.

The issue instead is what the brand-new proposed ":=" should do, which
isn't used in that example at all.  Whether that specific example can
be written in 500 other ways (of course it can) isn't really relevant.

One of the ironies already noted is that PEP 572 gives an example of
something that _won't_ work ("progressive_sums") which happens to be
the very use case that started the current debate about assignment
expressions to begin with.  That raises the very same issue about ":="
that "the obvious" rewrite of my example at the top raises.  Which
suggests to me (& apparently to Guido too) that there may be a real
issue here worth addressing.

There are many use cases for binding expressions outside of
synthetically generated functions.  For PEP 572, it's the totality
that will be judged, not just how they might work inside list
comprehensions and generator expressions (the only topics in _this_
particular thread), let alone how they work in one specific example.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-08 Thread Tim Peters
[Tim]
>> (inside a synthetic function created to
>> implement a listcomp/genexp, names bound by "=" are local; names bound
>> by ":=" are nonlocal; names bound by both are "who cares?"-
>> compiler-time error would be fine by me, or the first person to show a
>> real use case wins).

[Jacco van Dorp ]
> Wait, you can't use = in a listcomp, right ? Or are you talking about
> the implementation hidden to the casual user ?

Sorry, I was too obscure there - I intended "=" to mean "name binding
by any means other than :=".  Off the top of my head, I believe that -
today - the only "any means other than :=" possible in a
listcomp/genexp is appearing as a target in a `for` clause (like the
`i` in "[i+1 for i in iterable]`).  If there's some other way I
missed, I meant to cover that too.

But, yes, you're right, `names bound by "="` makes no literal sense at
all there.


> I thought letting := bind to the surrounding scope was fine basically
> because it's currently not possible, so therefore there would be no
> syntactic ambiguity, and it'd actually do what people would expect.

It's not really about the semantics of `:=` so much as about how
synthetic functions are defined.  In most cases, it amounts to saying
"in the nested function synthesized for a listcomp/genexp, if a name
`x` appears as the target of a binding expression in the body, a
`nonlocal x` declaration is generated near the top of the synthetic
function".

For example, if this block appears inside a function:

it = (i for i in range(10))
total = 0
for psum in (total := total + value for value in it):
print(psum}

under the current PEP meaning it blows up in the same way this code
blows up today:

it = (i for i in range(10))
total = 0
def _func(it):
for value in it:
  total = total + value  # blows up here
  yield total
for psum in _func(it):
  print(psum)

with

UnboundLocalError: local variable 'total' referenced before assignment

But add

nonlocal total

at the top of `_func()` and it works fine (displays 0, 1, 3, 6, 10, 15, ...).

So it's not really about what ":=" does, but about how ":=" affects
scope in synthesized nested functions.

But if you wrote a nested function yourself?  There's no suggestion
here that ":=" have any effect on scope decisions in explicitly given
nested functions (same as for "=", it would imply "the target is
local"), just on those generated "by magic" for listcomps/genexps.

Maybe there should be, though.  My initial thought was "no, because
the user has total control over scope decisions in explicitly given
functions today, but if something was magically made nonlocal they
would have no way to override that".
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-08 Thread Tim Peters
[Guido]
>> So the way I envision it is that *in the absence of a nonlocal or global
>> declaration in the containing scope*, := inside a comprehension or genexpr
>> causes the compiler to assign to a local in the containing scope, which is
>> elevated to a cell (if it isn't already). If there is an explicit nonlocal
>> or global declaration in the containing scope, that is honored.

[Juancarlo Añez ]
> This seems to be getting awfully complicated. Proof? Try to write the docs
> for the proposed semantics.

Implementation details - even just partial sketches - are always
"busy".  Think of it this way instead:  it's _currently_ the case that
listcomps & genexps run in a scope S that's the same as the scope C
that contains them, _except_ that names appearing as `for` targets are
local to S.  All other names in S resolve to exactly the same scopes
they resolved to in C (local in C, global in C, nonlocal in C -
doesn't matter).

What changes now?  Nothing in that high-level description, except that
a name appearing as a binding expression target in S that's otherwise
unknown in C establishes that the name is local to C.  That's nothing
essentially new, though - bindings _always_ establish scopes for
otherwise-unknown names in Python.


> I don't understand why we went so astray from the original requirements,
> which could all be met by having `if` and `while` accept `as` to bind an
> expression to a variable that would be local to the structured statement.

"Original" depends on when you first jumped into this ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A comprehension scope issue in PEP 572

2018-05-08 Thread Tim Peters
These all match my expectations.  Some glosses:

[Guido]
> So the way I envision it is that *in the absence of a nonlocal or global
> declaration in the containing scope*, := inside a comprehension or genexpr
> causes the compiler to assign to a local in the containing scope, which is
> elevated to a cell (if it isn't already). If there is an explicit nonlocal
> or global declaration in the containing scope, that is honored.

If the genexp/listcomp is at module level, then "assign to a local in
the containing scope" still makes sense ("locals" and "globals" mean
the same thing at module level), but "elevated to a cell" doesn't then
- it's just a plain global.

In absolutely all cases, what I expect is that

NAME := EXPR

in a genexp/listcomp do the binding _as if_

NAME = object_EXPR_evaluates_to

were executed in the immediately containing scope.  Describing the
goal instead of part of the implementation may be easier to grasp ;-)


> Examples:
>
>   # Simplest case, neither nonlocal nor global declaration
>   def foo():
>   [p := q for q in range(10)]  # Creates foo-local variable p
>   print(p)  # Prints 9
>
>   # There's a nonlocal declaration
>   def bar():
>   p = 42  # Needed to determine its scope
>   def inner():
>   nonlocal p
>   [p := q for q in range(10)]  # Assigns to p in bar's scope
>   inner()
>   print(p)  # Prints 9
>
>   # There's a global declaration
>   def baz():
>   global p
>   [p := q for q in range(10)]
>   baz()
>   print(p)  # Prints 9
>
> All these would work the same way if you wrote list(p := q for q in
> range(10)) instead of the comprehension.

100% agreed.  Add at module scope:

[p := q for q in range(10)]
print(p) # Prints 9

But uou're on your own for class scope, because I never did anything
fancy enough at class scope to need to learn how it works ;-)


> We should probably define what happens when you write [p := p for p in
> range(10)]. I propose that this overwrites the loop control variable rather
> than creating a second p in the containing scope -- either way it's probably
> a typo anyway.

A compile-time error would be fine by me too.  Creating two meanings
for `p` is nuts - pick one in case of conflict.  I suggested before
that the first person with a real use case for this silliness should
get the meaning their use case needs, but nobody bit, so "it's local
then" is fine.


> := outside a comprehension/genexpr is treated just like any other assignment
> (other than in-place assignment), i.e. it creates a local unless a nonlocal
> or global declaration exists.

Also agreed.  People have total control over scopes in explicitly
given functions now, and if the compiler magically made anything
nonlocal they would have no way to stop it.  Well, I suppose we could
add a "non_nonlocal" declaration, but I'd rather not ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-05 Thread Tim Peters
[Nick]
>...
> There were a couple key reasons I left the "for x in y" case out of the
> initial proposal:
>
> 1. The "for x in y" header is already quite busy, especially when tuple
> unpacking is used in the assignment target
> 2. Putting the "given" clause at the end would make it ambiguous as to
> whether it's executed once when setting up the iterator, or on every
> iteration
> 3. You can stick in an explicit "if True" if you don't need the given
> variable in the filter condition
>
> [(fx**2, fx**3) for x in xs if True given fx = f(x)]
>
> And then once you've had an entire release where the filter condition was
> mandatory for the comprehension form, allowing the "if True" in "[(fx**2,
> fx**3) for x in xs given fx = f(x)]" to be implicit would be less ambiguous.

And some people claim ":=" would make Python harder to teach ;-)


[Tim]
>> ...
>> It''s certain sanest as
>>
>> if x**2 + y**2 > 9 given x, y = func_returning_twople():
>>
>> "given" really shines there!

> Yep, that's why I don't have the same immediate reaction of "It would need
> to be limited to simple names as targets" reaction as I do for assignment
> expressions. It might still be a good restriction to start out with, though

I contrived that specific "use case", of course - I actually didn't
stumble into any real code where multiple targets would benefit to my
eyes.  Perhaps because, as you noted above of `"for x in y" headers`,
multiple-target assignment statements are often quite busy already too
(I have no interest in cramming as much logic as possible into each
line - but "sparse is better than dense" doesn't also mean "almost
empty is better than sparse" ;-) ).


> (especially if we wanted to allow multiple name bindings in a single given
> clause).

>> ...
>> The one-letter variable name obscures that it doesn't
>> actually reduce _redundancy_, though.  That is, in the current
>>
>> match = pattern.search(data)
>> if match:
>>
>> it's obviously less redundant typing as:
>>
>> if match := pattern.search(data):
>>
>> In
>>
>> if match given match = pattern.search(data):
>>
>> the annoying visual redundancy (& typing) persists.

> Right, but that's specific to the case where the desired condition really is
> just "bool(target)".

Not only.  If the result _needs_ to be used N times in total in the
test, binding expressions allow for that, but `given` requires N+1
instances of the name (the "extra one" to establish the name to begin
with).

For example, where `probable_prime()` returns `True` or `False`, and
`bool(candidate)` is irrelevant:

# highbit is a power of 2 >= 2; create a random prime
# whose highest bit is highbit

while (not probable_prime(candidate) given candidate =
highbit | randrange(1, highbit, 2)):
pass

versus

while not probable_prime(candidate :=
 highbit | randrange(1, highbit, 2)):
pass

There I picked a "long" name to make the redundancy visually annoying ;-)


> That's certainly likely to be a *common* use case,

In all the code I looked at where I believe a gimmick like this would
actually help, it was indeed by far _most_ common that the result only
needed to be used once in the test.  In all such cases, the binding
expression spelling of the test requires one instance of the name, and
the `given` spelling two.


> but if we decide that it's *that* particular flavour of redundancy that really
> bothers us, then there's always the "if expr as name:" spelling (similar to
> the way that Python had "a and b" and "a or b" logical control flow
> operators long before it got "a if c else b").

Reducing each redundancy is a small win to me, but reaches
"importance" because it's so frequent.  Binding expressions have more
uses than _just_ that, though.

But I'm sure I don't know what they all are.  When a _general_ feature
is added, people find surprising uses for it.

For example, at times I'd love to write code like this, but can't:

while any(n % p == 0 for p in small_primes):
# divide p out - but what is p?

Generator expressions prevent me from seeing which value of `p`
succeeded.  While that's often "a feature", sometimes it's a PITA.  I
don't know whether this binding-expression stab would work instead
(I'm not sure the PEP realized there's "an issue" here, about the
intended scope for `thisp`):

while any(n % (thisp := p) == 0 for p in small_primes):
n //= thisp

If that is made to work, I think that counts as "a surprising use"
(capturing a witness for `any(genexp)` and a counterexample for
`all(genexp)`, both of which are wanted at times, but neither of which
`any()`/`all()` will ever support on their own)..

I suppose I could do it with `given` like so:

while p is not None given p = next(
(p for p in small_primes if n % p == 0),
None):
n //= p

but at that point I'd pay to go back to the original loop-and-a-half ;-)


>> One more, 

Re: [Python-ideas] Pattern Matching Syntax

2018-05-05 Thread Tim Peters
[Tim]
 ... I liked the way he _reached_ that conclusion:  by looking at real-
 life Python code that may have been written instead to use constructs
 "like this".  I find such examination far more persuasive than abstract
 arguments or made-up examples.

[Serhiy]
>>> I would like to see such examination for PEP 572. And for all other
>>> syntax changing ideas.

[Tim]
>> I did it myself for 572, and posted several times about what I found.

[Serhiy]
> Could you please give links to these results? It is hard to find something
> in hundreds of messages.

It's no easier for me to find old messages, and you'd just ask for
more & more anyway ;-)  The "short course" I already gave didn't skip
anything vital:

Short course:  I found a small win frequently, a large win rarely, but
in most cases wouldn't use it.  In all I expect I'd use it significantly
more often than ternary "if", but far less often than augmented assignment.

More importantly:

But that's me - everybody needs to look at their own code to apply
_their_ judgment.

It's _applying_ the approach I find persuasive & productive, not
someone else writing up the results of _their_ taking the approach.
I'm not trying to change peoples' minds - just suggesting a more
fruitful way (than abstract arguments, fashion, ...) to make up their
minds to begin with.


> I withdrew some my ideas and patches when my examinations showed that the
> number of cases in the stdlib that will take a benefit from rewriting using
> a new feature or from applying a compiler optimization is not large enough.

Bingo!  Note your "my examinations" in that.  Someone who hasn't done
their own examination is basically guessing.  They may or may not
reach the same conclusions if they did the work, but neither eloquence
nor confidence is a reliable predictor of whether they would.  Passion
may even be negatively correlated ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Modern language design survey for "assign and compare" statements

2018-05-19 Thread Tim Peters
[Steven D'Aprano ]
> ...
> Red (2011)
> ...
> ... claims to be nearly identical to Rebol.
>
> Wikipedia describes Rebol (and presumably Red) as not having either
> expressions or statements in usual sense. Based on my reading, it is
> kinda-sorta like Forth except without the stack or the postfix syntax.
> The creator of Red describes it:
>
> "About the syntax and semantics, in a nutshell, it's a Lisp without
> parentheses and with infix operator support."
>
> which suggests that assignment could be an expression. There's also a
> "set" function which can assign values, but frankly the Rebol
> programming model confuses me and so I'll count this as a "Possibly, but
> I can't tell for sure so let's say No" for the question of assignment
> expressions.

I was an early REBOL user, and my head still hurts ;-)  It was ...
different, for sure.  The syntax is in some sense extremely simple,
hoping to make it easy to define problem-specific layers of syntax on
top of it.  But I gave up before they got that far.

In any case, everything is "an expression" there, including
assignment.  That's spelled:

WORD ":"  WHITESPACE+  EXPRESSION

although that's lying a bit.

Here from a Red shell:

>> i: 1
== 1
>> j: k: 2
== 2
>> print [i j k]
1 2 2
>> i: 12 j: 13
== 13
>> print [i j]
12 13
>> x: 12 print add y: 88 x
100
>> y
== 88

That last one shows one of the challenges for people coming from
"normal" languages:  this does the same:

>> (x: 12) (print (add (y: 88) x))
100

showing that it really is a lot like "Lisp without parentheses".
Without the parens, it's impossible to tell where function calls begin
and end without knowing which names denote functions and how many
arguments each function requires!  That's "a feature", they insist ;-)
 To be fair, you really do get used to it quickly for the heavily used
builtin functions.

One more:

>> i1+2=3*88: 6
== 6
>> i1+2=3*88
== 6

Yup!  "i1+2=3*88" is a variable name there :-)

Short course:  definitely "yes" on assignment expressions for Red.
But I'm going to out on a limb and guess that the typical Python
programmer wouldn't find "but Red does it!" persuasive ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Modern language design survey for "assign and compare" statements

2018-05-20 Thread Tim Peters
[Tim]
>> I was an early REBOL user, and my head still hurts ;-)  It was ...
>> different, for sure.

[Steven D'Aprano ]
> Yeah, to me it looks more like a prefix version of Forth than Lisp.
> Complete with "anything can be a name":

The example I gave just strung "words" together, but just as
fundamental is the notion of "block":  a series of words and/or blocks
enclosed in square brackets.  That's akin to Lisp's "S expressions",
but Rebol/Red _don't_ "evaluate" blocks by default.  Blocks can hold
data or code, or a mix of both, and the syntax doesn't care.  That's
the real reason "(almost) anything can be a name (word)".

Here I'll build to a simple example where _some_ Pythoneers will
experience envy rather than revulsion ;-)

>> [x y]
== [x y]

The block isn't evaluated.  Depending on how it's _used_, it may be
treated as data (perhaps you want to think of it as being a sequence
of two symbols, or strings).  If you evaluate it, it blows up (because
I don't happen to have any variables with those names defined):

>> do [x y]
*** Script Error: x has no value

Give the names some values, and _then_ it can be evaluated; and
evaluating a block returns the value of the last expression in the
block:

>> x: 12 y: 13
== 13
>> [x y]
== [x y]
>> do [x y]
== 13

If you want a block with all the expressions' values, use `reduce` instead:

>> reduce [x y]
== [12 13]

Code?  Data?  No difference.

Here's the part where some Pythoneers will get jealous:

>> sum: func [x y] [x + y]
== func [x y][x + y]
>> sum 8 2
== 10

`func` is just another function that happens to build an anonymous
function.  It takes two blocks as arguments:  a block containing the
formal argument names, and a block with the code to execute.  Both
blocks are essentially "data" to `func`.  It doesn't look much like
Forth anymore ;-)

Note that the following does exactly the same:

>> arglist: [x y]
== [x y]
>> body: [x + y]
== [x + y]
>> sum2: func arglist body
== func [x y][x + y]
>> sum2 8 2
== 10

In practice, slinging Rebol/Red most feels like working in a
functional language.  One of their real goals, though, is to enable
writing functions that can be called like so:

schedule "Change furnace filter" [
starting today
then every 3 months
]

There's an elaborate `parse` function built in that supports many ways
of applying user-supplied rules to parse blocks like the one in that
example.

Of course you can emulate much the same in Python by, e.g., passing
triple-quoted strings instead.  In that specific example.  Rebol/Red
provide a minimum of syntax shared by all such applications, and when
the sub-language gets elaborate enough that _nesting_ blocks makes
good sense, nesting triple-quoted strings sucks ;-)

All languages have some good things going for them.  But I'm not sure
I've ever seen Rebol/Red code that _used_ the value of an assignment
expression; that it has them at all seems much more to follow from
that everything is an expression.  As in most functional languages, if
you want initialized local variables, you're more likely to invoke an
anonymous spelled-inline function instead.  Which is what, e.g.,
Haskell's "let PILE_OF_BINDINGS in EXPRESSION" is syntactic sugar for
doing.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Modern language design survey for "assign and compare" statements

2018-05-20 Thread Tim Peters
[Tim, on Rebol/Red]
>> >> x: 12 y: 13
>> == 13
>> >> [x y]
>> == [x y]
>> >>
>> >> do [x y]
>> == 13

[Greg Ewing ]
> How does scoping work? If you pass a block to a function
> which evaluates it, how are names in the block resolved?

Too involved, but basically a form of lexical scoping.  Rebol
struggled with this, and details vary by version.  Here are the last
docs for version 2:

 http://www.rebol.com/r3/docs/concepts/funcs-scope.html

Details changed again for version 3.  Best I can tell, Red hasn't
written up its own rules yet.

The model did _not_ support closures naturally (in the sense of local
bindings persisting beyond a function's return).  Version 3 added an
alternative to `func`, named `closure`, which could be used when you
wanted a closure:

http://www.rebol.com/r3/docs/datatypes/closure.html

But I noticed just now that Red doesn't have `closure` built in, and
its funcs don't support closures naturally either:

>> adder: func [x] [func [y] [x + y]]
== func [x][func [y] [x + y]]

>> add3: adder 3
== func [y][x + y]

>> add3 12
*** Script Error: x is not in the specified context
*** Where: add3
*** Stack: add3

That's not dynamic scoping, either - it's still unhappy if `x` is
defined at top level:

>> x: 18
== 18
>> add3 12
*** Script Error: x is not in the specified context
*** Where: add3
*** Stack: add3
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

2018-05-25 Thread Tim Peters
[Peter O'Connor]
>> ...
>> We could use given for both the in-loop variable update and the variable
>> initialization:
>>smooth_signal =  [average given average=(1-decay)*average + decay*x
>> for x in signal] given average=0.

[Steven D'Aprano ]
> I don't think that will work under Nick's proposal, as Nick does not
> want assignments inside the comprehension to be local to the surrounding
> scope. (Nick, please correct me if I'm wrong.)

Nick appears to have moved on from "given" to more-general augmented
assignment expressions.  See PEP 577, but note that it's still a
work-in-progress:

https://github.com/python/peps/pull/665


Under that PEP,

average = 0
smooth_signal =  [(average := (1-decay)*average + decay*x)
 for x in signal]

Or, for the running sums example:

total = 0
sums = [(total += x) for x in data]

I'm not entirely clear on whether the "extra" parens are needed, so
added 'em anyway to make grouping clear.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] High Precision datetime

2018-05-17 Thread Tim Peters
[Chris Barker]
> Does that support the other way -- or do we never lose a leap second anyway?
> (showing ignorance here)

Alexander covered the Python part of this, so I'll answer the possible
higher-level question:  we haven't yet needed a "negative" leap
second, and it's considered unlikely (but not impossible) that we ever
will.  That's because the Earth's rotation is inexorably slowing ,so
the mean solar day inexorably lengthens when measured by SI seconds.

Other things can cause the Earth's rotation to speed up temporarily
(like some major geological events), but they've only been able to
overcome factors acting to slow rotation for brief periods, and never
yet got near to overcoming them by a full second.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Verbatim names (allowing keywords as names)

2018-05-15 Thread Tim Peters
[Terry Reedy]
> ...
> I believe avoiding tagging raw names as keywords could be done by adjusting
> the re for keywords

Yup - it should just require adding a negative lookbehind assertion; e.g.,

>>> import re
>>> keypat = r"(?>> re.search(keypat, r"yup! while")
<_sre.SRE_Match object; span=(5, 10), match='while'>
>>> re.search(keypat, r"nope! \while") # None
>>>

The "(?https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: Trigonometry in degrees

2018-06-11 Thread Tim Peters
[Steven D'Aprano]

> ...

The initial proposal is fine: a separate set of trig functions that take
> their arguments in degrees would have no unexpected surprises (only the
> expected ones). With a decent implementation i.e. not this one:
>
> # don't do this
> def sind(angle):
> return math.sin(math.radians(angle))
>

But that's good enough for almost all real purposes.  Indeed, except for
argument reduction, it's essentially how scipy's sindg and cosdg _are_
implemented:

https://github.com/scipy/scipy/blob/master/scipy/special/cephes/sindg.c


> and equivalent for cos, we ought to be able to get correctly rounded
> values for nearly all the "interesting" angles of the circle, without
> those pesky rounding issues caused by π not being exactly representable
> as a float.
>
> If people are overly ;-) worried about tiny rounding errors, just compute
things with some extra bits of precision to absorb them.  For example,
install `mpmath` and use this:

 def sindg(d):
import math, mpmath
d = math.fmod(d, 360.0)
if abs(d) == 180.0:
return 0.0
with mpmath.extraprec(12):
return float(mpmath.sin(mpmath.radians(d)))

Then, e.g,

>>> for x in (0, 30, 90, 150, 180, 210, 270, 330, 360):
... print(x, sindg(x))
0 0.0
30 0.5
90 1.0
150 0.5
180 0.0
210 -0.5
270 -1.0
330 -0.5
360 0.0

Notes:

1. Python's float "%" is unsuitable for argument reduction; e.g.,

>>> -1e-14 % 360.0
360.0

`math.fmod` is suitable, because it's exact:

>>> math.fmod(-1e-14, 360.0)
-1e-14

2. Using a dozen extra bits of precision make it very likely you'll get the
correctly rounded 53-bit result; it will almost certainly (barring bugs in
`mpmath`) always be good to less than 1 ULP.

3. Except for +-180.  No matter how many bits of float precision (including
the number of bits used to approximate pi) are used, converting that to
radians can never yield the mathematical `pi`; and sin(pi+x) is
approximately equal to -x for tiny |x|; e.g., here with a thousand bits:

>>> mpmath.mp.prec = 1000
>>> float(mpmath.sin(mpmath.radians(180)))
1.2515440597544546e-301

So +-180 is special-cased.  For cosdg, +-{90. 270} would need to be
special-cased for the same reason.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: Trigonometry in degrees

2018-06-12 Thread Tim Peters
>
>
> [Tim]

>> 1. Python's float "%" is unsuitable for argument reduction; e.g.,
> >>
> >>  >>> -1e-14 % 360.0
> >> 360.0
> >>
> >> `math.fmod` is suitable, because it's exact:
> >>
> >>  >>> math.fmod(-1e-14, 360.0)
> >> -1e-14
>


> [Greg Ewing]

> So why doesn't float % use math.fmod?
>

[Chris Angelico]

>
> https://docs.python.org/3/reference/expressions.html#binary-arithmetic-operations
> https://docs.python.org/3/reference/expressions.html#id17
> https://docs.python.org/3/reference/expressions.html#id18
>
> (the latter two being footnotes from the section in the first link)
>
> With real numbers, divmod (and thus the // and % operators) would
> always return values such that:
>
> div, mod = divmod(x, y):
> 1) div*y + mod == x
> 2) sign(mod) == sign(y)
> 3) 0 <= abs(mod) < abs(y)
>
> But with floats, you can't guarantee all three of these. The divmod
> function focuses on the first, guaranteeing the fundamental arithmetic
> equality, but to do so, it sometimes has to bend the third one and
> return mod==y.
>
> It's more that #2 is viewed as fundamental (because that's most useful for
positive integer y), and _given that_ sometimes results are fiddled to keep
#1 approximately true, and strict inequality in #3 may be sacrificed.  For
`fmod`, sign(mod) == sign(x) instead.

>>> -2 % 3
1
 >>> -2.0 % 3.0
1.0
>>> math.fmod(-2.0, 3.0)
-2.0

All mod functions, m(x, y), strive to return a result that's mathematically
exactly equal to x-n*y (for some mathematical integer `n` that may not even
be representable in the programming language).  `fmod()` is exact in that
sense, but Python's floating "%" may not be. and no float scheme such that
sign(m(x, y)) = sign(y) can be (see the original example at the top:  the
only mathematical integer `n` such that the mathematical   -1e-14 - n*360.0 is
exactly representable as a double is n==0).

The most useful mod function for floats _as floats_ would actually satisfy

abs(m(x, y)) <= abs(y) / 2

That can be done exactly too - but then the sign of the result has
approximately nothing to do with the signs of the arguments.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: Trigonometry in degrees

2018-06-13 Thread Tim Peters
[Richard Damon]

> My first comment is that special casing values like this can lead to
> some very undesirable properties when you use the function for numerical
> analysis. Suddenly your sind is no longer continuous (sind(x) is no
> longer the limit of sind(x+d) as d goes to 0).
>
> As I stated in my initial comment on this, if you are going to create a
> sind function with the idea that you want 'nice' angles to return
> 'exact' results, then what you need to do is have the degree based trig
> routines do the angle reduction in degrees, and only when you have a
> small enough angle, either use the radians version on the small angle or
> directly include an expansion in degrees.
> ...
>

Either way, it's necessary to get the effect of working in greater than
output precision, if it's desired that the best possible result be returned
for cases well beyond just the handful of "nice integer inputs" people
happen to be focused on today.

So I'll say again that the easiest way to do that is to use `mpmath` to get
extra precision directly.

The following does that for sindg, cosdg, and tandg.

- There are no special cases.  Although tandg(90 + i*180) dies with
ZeroDivisionError inside mpmath, and that could/should be fiddled to return
an infinity instead.

- Apart from that, all functions appear to give the best possible
double-precision result for all representable-as-a-double integer degree
inputs (sindg(30), cosdg(-10), doesn't matter).

- And for all representable inputs of the form `integer + j/32` for j in
range(32).

- But not for all of the form `integer + j/64` for j in range(1, 64, 2).  A
few of those suffer greater than 1/2 ULP error.  Setting EXTRAPREC to 16 is
enough to repair those - but why bother? ;-)

- Consider the largest representable double less than 90:

>>> x
89.99
>>> x.hex()
'0x1.67fffp+6'

The code below gives the best possible tangent:

>>> tandg(x)
4031832051015932.0

Native precision is way off:

>>> math.tan(math.radians(x))
3530114321217157.5

It's not really the extra precision that saves the code below, but allowing
argument reduction to reduce to the range [-pi/4, pi/4] radians, followed
by exploiting trigonometric identities.  In this case, exploiting that
tan(pi/2 + z) = -1/tan(z).  Then even native precision is good enough:

>>> -1 / math.tan(math.radians(x - 90))
4031832051015932.0

Here's the code:

import mpmath
from math import fmod
# Return (n, x) such that:
# 1. d degrees is equivalent to x + n*(pi/2) radians.
# 2. x is an mpmath float in [-pi/4, pi/4].
# 3. n is an integer in range(4).
# There is one potential rounding error, when mpmath.radians() is
# used to convert a number of degrees between -45 and 45.  This is
# done using the current mpmath precision.
def treduce(d):
d = fmod(d, 360.0)
n = round(d / 90.0)
assert -4 <= n <= 4
d -= n * 90.0
assert -45.0 <= d <= 45.0
return n & 3, mpmath.radians(d)

EXTRAPREC = 14
def sindg(d):
with mpmath.extraprec(EXTRAPREC):
n, x = treduce(d)
if n & 1:
x = mpmath.cos(x)
else:
x = mpmath.sin(x)
if n >= 2:
x = -x
return float(x)

def cosdg(d):
with mpmath.extraprec(EXTRAPREC):
n, x = treduce(d)
if n & 1:
x = mpmath.sin(x)
else:
x = mpmath.cos(x)
if 1 <= n <= 2:
x = -x
return float(x)

def tandg(d):
with mpmath.extraprec(EXTRAPREC):
n, x = treduce(d)
x = mpmath.tan(x)
if n & 1:
x = -1.0 / x
return float(x)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: Trigonometry in degrees

2018-06-14 Thread Tim Peters
I should note that numeric code "that works" is often much subtler than it
appears at first glance.  So, for educational purposes, I'll point out some
of what _wasn't_ said about this crucial function:

[Tim]

> import mpmath
> from math import fmod
> # Return (n, x) such that:
> # 1. d degrees is equivalent to x + n*(pi/2) radians.
> # 2. x is an mpmath float in [-pi/4, pi/4].
> # 3. n is an integer in range(4).
> # There is one potential rounding error, when mpmath.radians() is
> # used to convert a number of degrees between -45 and 45.  This is
> # done using the current mpmath precision.
> def treduce(d):
> d = fmod(d, 360.0)
> n = round(d / 90.0)
> assert -4 <= n <= 4
> d -= n * 90.0
> assert -45.0 <= d <= 45.0
> return n & 3, mpmath.radians(d)
>
> How do we know there is at most one rounding error in that?  No, it's not
obvious.

That `fmod` is exact is guaranteed by the relevant standards, but most
people who write a libm don't get it right at first.  There is no "cheap"
way to implement it correctly.  It requires getting the effect of doing
exact integer division on, potentially, multi-thousand bit integers.
Assuming x > y > 0, a correct implementation of fmod(x, y) ends up in a
loop that goes around a number of times roughly equal to log2(x/y),
simulating one-bit-at-a-time long division  For example, here's glibc's
implementation:

https://github.com/bminor/glibc/blob/master/sysdeps/ieee754/dbl-64/e_fmod.c

Don't expect that to be easy to follow either ;-)

Then how do we know that `d -= n * 90.0" is exact?  That's not obvious
either.  It follows from the "Sterbenz lemma", one version of which:  if x
and y are non-zero floats of the same sign within a factor of 2 of each
other,

1/2 <= x/y <= 2  (mathematically)

then x-y is exactly representable as a float too.  This is true regardless
of the floating-point base, or of rounding mode in use.  In IEEE-754, it
doesn't even need weasel words to exempt underflowing cases.

That lemma needs to be applied by cases, for each of the possible values of
(the integer) `n`.  It gets closest to failing for |n| = 1.  For example,
if d is a tiny bit larger than 45, n is 1, and then d/90 is
(mathematically) very close to 1/2.

Which is another thing that needs to be shown:  "if d is a tiny bit larger
than 45, n is 1".  Why?  It's certainly true if we were using infinite
precision, but we're not.  The smallest representable double > 45 is 45 +
2**-47:

>>> d = 45 + 2**-47
>>> d
45.01
>>> _.hex()
'0x1.68001p+5'

Then d/90.0 (the argument to round()) is, with infinite precision,

(45 + 2**-47)/90 =
0.5 + 2**-47/90

1 ULP with respect to 0.5 is 2**-53, so that in turn is equal to

0.5 + 2**-53/(90/64) =
0.5 + (64/90)*2**-53 =
0.5 + 0.711... * 2**-53

Because the tail (0.711...) is greater than 0.5 ULP, it rounds up under
nearest-even rounding, to

0.5 + 2**-53

>>> d / 90
0.5001
>>> 0.5 + 2**-53
0.5001

and so Python's round() rounds it up to 1:

>>> round(_)
1

Note that it would _not_ be true if truncating "rounding" were done, so
round-nearest is a hidden assumption in the code.

Similar analysis needs to be done at values near the boundaries around all
possible values of `n`.

That `assert -45.0 <= d <= 45.0` can't fall then follows from all of that.

In all, a proof that the code is correct is much longer than the code
itself.  That's typical.  Alas, it's also typical that math library sources
rarely point out the subtleties.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: Trigonometry in degrees

2018-06-16 Thread Tim Peters
[Steven D'Aprano ]

> Thanks Tim!
>

You're welcome ;-)


> Reading your digressions on the minutia of floating point maths is
> certainly an education. It makes algebra and real-valued mathematics
> seem easy in comparison.
>

Hard to say, really.  The problem with floating point is that it's so
God-awful lumpy - special cases all over the place.  Signaling and quiet
NaNs; signed infinities; signed zeroes; normal finites all with the same
number of bits, but where the gap between numbers changes abruptly at
power-of-2 boundaries; subnormals where the gap remains the same across
power-of-2 boundaries, but the number of _bits_ changes abruptly; all "the
rules" break down when you get too close to overflow or underflow; four
rounding modes to worry about; and a whole pile of technically defined
exceptional conditions and related traps & flags.

Ignoring all that, though, it's pretty easy ;-)  754 was dead serious about
requiring results act is if a single rounding is done to the infinitely
precise result, and that actually allows great simplification in reasoning.

The trend these days appears to be using automated theorem-proving systems
to keep track of the mountain of interacting special cases.  Those have
advanced enough that we may even be on the edge of getting
provably-correctly-rounded transcendental functions with reasonable speed.
Although it's not clear people will be able to understand the proofs ;-)

I still haven't got over Mark Dickinson's demonstration a few years
> back that under Decimal floating point, but not binary, it is possible
> for the ordinary arithmetic average (x+y)/2 to be outside of the
> range [x, y]:
>
> py> from decimal import getcontext, Decimal
> py> getcontext().prec = 3
> py> x = Decimal('0.516')
> py> y = Decimal('0.518')
> py> (x + y) / 2
> Decimal('0.515')
>

Ya, decimal fp doesn't really solve anything except the shallow surprise
that decimal fractions generally aren't exactly representable as binary
fractions.  Which is worth a whole lot for casual users, but doesn't
address any of the deep problems (to the contrary, it makes those a bit
worse).

I like to illustrate the above with 1-digit decimal fp, because it makes it
more apparent at once that - unlike as in binary fp - multiplication and
division by 2 may _not_ be exact in decimal fp.  We can't even average a
number "with itself" reliably:

>>> import decimal
>>> decimal.getcontext().prec = 1
>>> x = y = decimal.Decimal(8); (x+y)/2 # 10 is much bigger than 8
Decimal('1E+1')
>>> x = y = decimal.Decimal(7); (x+y)/2 # 5 is much smaller than 7
Decimal('5')

But related things _can_ happen in binary fp too!  You have to be near the
edge of representable non-zero finites though:

>>> x = y = 1e308
>>> x
1e+308
>>> (x+y)/2
inf

Oops.  So rewrite it:

>>> x/2 + y/2
1e+308

Better!  But then:

>>> x = y = float.fromhex("3p-1074")
>>> x
1.5e-323
>>> x/2 + y/2
2e-323

Oops.  A math library has to deal with everything "correctly".  Believe it
or not, this paper

"How do you compute the midpoint of an interval?"
https://hal.archives-ouvertes.fr/hal-00576641v1/document

is solely concerned with computing the average of two IEEE doubles, yet
runs to 29(!) pages.  Almost everything you try fails for _some_ goofy
cases.

I personally write it as (x+y)/2 anyway ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: collections.Counter should implement fromkeys

2018-06-29 Thread Tim Peters
[
Note to repliers to Abe (and others recently):  replies to Google Groups
posts are broken for now on this list, so be sure to replace

python-id...@googlegroups.com

with

python-ideas@python.org

in your reply.  Else the mailing list (neither Google Groups nor the
python.org archive) won't get it.
]

[Tim]
>> So, e.g., someone will be unpleasantly surprised no matter what

[Abe Dillon]
> Sure, but in Hettinger's own words
 "whenever you have a
constructor war,
> everyone should get their wish". People that want a counting constructor
> have that,people that want the ability to initialize values don't have
that.

I think the missing bit here is that there weren't any "constructor wars"
for Counter.  In all this time, I don't believe I've heard anyone say they
wanted a Counter.fromkeys() before.  For that matter, I'd bet a dollar that
most Python programmers don't know that dict.fromkeys() exists, despite
that it was added in Python 2.3.

As I recall, the primary motivation for adding dict.fromkeys() was to make
using dicts to mimic sets a little easier, by providing a constructor that
threw away duplicates and didn't really care about the values (so no value
was required, and nobody cared that it defaulted to the _seemingly_ insane
`None` - using `None` values for sets-implemented-as-dicts was a de facto
informal standard at the time).

But one release later (2.4) a set type was added too, so the primary
motivation for fromkeys() went away.

15 years later you're jumping up & down about Counter.fromkeys() not being
there, and that's why nobody much cares ;-)


> ...
> I'm tempted to indulge in the meta argument which you're obviously
> striving to avoid,

And succeeding!  I can't be sucked into it :-)  FWIW, fine by me if
Counter.fromkeys() is added, doing exactly what you want.  Raymond may have
a different judgment about that, though.  I don't believe he reads
python-ideas anymore, so opening an enhancement request on bugs.python.org
is the way to get his attention.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Fwd: collections.Counter should implement fromkeys

2018-06-29 Thread Tim Peters
Reposting because the original got bounced from Google Groups.

-- Forwarded message -
From: Tim Peters 
Date: Fri, Jun 29, 2018 at 5:54 PM
Subject: Re: [Python-ideas] collections.Counter should implement fromkeys
To: 
Cc: python-ideas 


[Abe Dillon ]
> ...
> I'm using the copy-constructor because I know Counter is a subclass of
dict.
> I'm using fromkeys because I know how that class method works.
> So why does the subclass lack functionality that the superclass has?
> Because programmers wouldn't be able to wrap their heads around it?
> I don't buy it. This feels like nanny-design trumping SOLID design
<https://en.wikipedia.org/wiki/SOLID>.

More because Counter.fromkeys() could be incoherent.  From the
implementation (in your Lib/collections/__init__.py):

@classmethod
def fromkeys(cls, iterable, v=None):
# There is no equivalent method for counters because setting v=1
# means that no element can have a count greater than one.
raise NotImplementedError(
'Counter.fromkeys() is undefined.  Use Counter(iterable)
instead.')

For a dict, a value appearing multiple times in the iterable doesn't
matter.  But a fundamental use case for Counters is to tally the _number_
of times duplicate keys appear.  So, e.g., someone will be unpleasantly
surprised no matter what:

Counter.fromkeys("a", 2)

returned.  "It should set key 'a' to 2!  that's what I said it should do!"
"No!  It should set key 'a' to 10!  that's what a Counter _always_ does -
sums the values associated with duplicate keys!" "You're both right - and
wrong!  It should raise an exception if there's a duplicate key, because
there's no compelling answer to what it should do!"

I expect Raymond called it NotImplementedError instead so he could release
the code instead of waiting 3 years for that debate to end ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] random.sample should work better with iterators

2018-06-27 Thread Tim Peters
>
> [Tim]

> In Python today, the easiest way to spell Abe's intent is, e.g.,
> >
> > >>> from heapq import nlargest # or nsmallest - doesn't matter
> > >>> from random import random
> > >>> nlargest(4, (i for i in range(10)), key=lambda x: random())
> > [75260, 45880, 99486, 13478]
> > >>> nlargest(4, (i for i in range(10)), key=lambda x: random())
> > [31732, 72288, 26584, 72672]
> > >>> nlargest(4, (i for i in range(10)), key=lambda x: random())
> > [14180, 86084, 22639, 2004]
> >
> > That also arranges to preserve `sample()'s promise that all sub-slices of
> > the result are valid random samples too (because `nlargest` sorts by the
> > randomly generated keys before returning the list).
>

[Antoine Pitrou]

> How could slicing return an invalid random sample?
>

For example, consider random.sample(range(2), 2).  As a set, there is only
one possible output, {0, 1}.  But it doesn't return a set, it returns a
list.  So there are two possible outputs:

[0, 1]
[1, 0]

random.sample() promises to return each of those about equally often, so
that, e.g., result[0:1] and result[1:2] are also random 1-samples.

If it always returned, say, [0, 1], that's "a random" 2-sample, but its
1-slices are as far from random 1-samples as is possible to get.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] random.sample should work better with iterators

2018-06-26 Thread Tim Peters
[Abe Dillon]

>  Randomly sampling from some population is often done because the entire
> > population is impractically large which is also a motivation for using
> > iterators, so it seems natural that one would be able to sample from an
> > iterator. A naive implementation could use a heap queue:
> >
> > import heapq
> > import random
> >
> > def stream():
> > while True: yield random.random()
> >
> > def sample(population, size):
> > q = [tuple()]*size
> > for el in zip(stream(), population):
> > if el > q[0]: heapq.heapreplace(q, el)
> > return [el[1] for el in q if el]
>

[Steven D'Aprano]

> Is that an improvement over:

sample(list(itertools.slice(population, size)))


> and if so, please explain.
>
> Different things entirely.  Your spelling is missing sample's required
second argument, and the difference should be clear if it's supplied:

  sample(list(itertools.slice(population, size)). size)

That is, it merely returns some permutation of the _initial_ `size` items
in the iterable.  The rest of the population is ignored.

In Python today, the easiest way to spell Abe's intent is, e.g.,

>>> from heapq import nlargest # or nsmallest - doesn't matter
>>> from random import random
>>> nlargest(4, (i for i in range(10)), key=lambda x: random())
[75260, 45880, 99486, 13478]
>>> nlargest(4, (i for i in range(10)), key=lambda x: random())
[31732, 72288, 26584, 72672]
>>> nlargest(4, (i for i in range(10)), key=lambda x: random())
[14180, 86084, 22639, 2004]

That also arranges to preserve `sample()'s promise that all sub-slices of
the result are valid random samples too (because `nlargest` sorts by the
randomly generated keys before returning the list).

However, it does _not_ preserve - and nothing can preserve for arbitrary
iterables - `sample()`'s promise to "[leave] the original population
unchanged".  We can't examine an arbitrary iterable's population at all
without exhausting the iterable, and that can be destructive.

So while this can indeed be useful, it would require changing `sample()` to
break that promise in some cases.

BTW, using a heap for this is uncommon.  Search on "reservoir sampling" for
more-common ways   Most common is probably Vitter's "Algorithm R", which
runs in O(len(iterable)) time (no additional log factor for a heap - it
doesn't use a heap).

I'd prefer to leave `sample()` alone, and introduce some spelling of
`possibly_destructive_sample()` for arbitrary iterables - if that's wanted
enough for someone to do the work ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: collections.Counter should implement fromkeys

2018-06-30 Thread Tim Peters
[Abe Dillon]
> I haven't been part of the conversation for 15 years, but most of the
argument
> against the idea (yours especially) seem to focus on the prospect of a
> constructor war and imply that was the original motivation behind actively
> disabling the fromkeys method in Counters.

I quoted the source code verbatim - its comment said fromkeys() didn't make
sense for Counters.  From which it's an easy inference that it makes more
than one _kind_ of sense, hence "constructor wars".  Not that it matters.

Giving some of the history was more a matter of giving a plausible reason
for why you weren't getting all that much feedback:  it's quite possible
that most readers of this list didn't even remember that `dict.fromkeys()`
is a thing.

> I don't mean to give the impression that I'm fanatical about this. It
really
> is a minor inconvenience. It doesn't irk me nearly as much as other minor
> things, like that the fact that all the functions in the heapq package
begin
> with the redundant word 'heap'.

You have to blame Guido for that one, which is even more futile than
arguing with Raymond ;-)  It never much bothered me, but I do recall doing
this once:

from heapq import heappush as push, heappop as pop # etc


>> Raymond may have a different judgment about that, though.  I don't
believe
>> he reads python-ideas anymore

> He actually did reply a few comments back!

Ya, I saw that!  He's always trying to make me look bad ;-)

> I think I'm having more fun chatting with people that I deeply respect
> than "jumping up and down". I'm sorry if I'm coming off as an asshole.

Not at all!  I've enjoyed your messages.  They have tended to more on the
side of forceful advocacy than questioning, though, which may grate after a
few more years.  As to my "jumping up and down", I do a lot of
leg-pulling.  I'm old.  It's not meant to offend, but I'm too old to care
if it does :-)

> We can kill this thread if everyone thinks I'm wasting their time. It
doesn't
> look like anyone else shares my minor annoyance. Thanks for indulging me!

Raymond's reply didn't leave any hope for adding Counter.fromkeys(), so in
the absence of a killer argument that hasn't yet been made, ya, it would be
prudent to move on.

Unless people want to keep talking about it, knowing that Raymond won't buy
it in the end.  Decisions, decisions ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-05-01 Thread Tim Peters
[MRAB]
> By "inject" I mean putting a name into a namespace:
>
> import my_module
> my_module.foo = 'FOO'
>
> You can't insert a name into a function's body to make a new local variable.

So you do mean at runtime, I think.  Then as before, you can do that
with module and the builtin namespaces now, but all function locals
need to be identifiable at compile time now.  People often get
confused about that when they read that "it's a binding" that makes a
name local to a scope, but it's not "_doing_ a binding" that makes it
local, merely that the name _syntactically_ appears as a target in a
binding construct.  That analysis is entirely done at compile time.

In Python's very early days, all namespaces could be dynamically
altered at any time in any way.  As time went on, more & more
restrictions were placed on runtime alteration of local scopes.  I
don't believe that, in current Python 3, dynamic alteration of local
scopes is supported in any case anymore.  Perhaps the last to go was
runtime alteration of locals via `exec`.  This still works in Python
2:

>>> def f():
... exec 'i = 1+2'
... print i
>>> f()
3
>>> i  # showing that it was function-local `i`, not global
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'i' is not defined

`exec` was a statement type in Python 2, so the compiler could
recognize that and generate radically different name-lookup code in a
scope containing an `exec` statement.  But in Python 3, `exec` is just
another builtin function, and the compiler knows nothing about what it
does.  Similar code run in Python 3 has no effect on f's locals.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-30 Thread Tim Peters
[MRAB ]
> I think it should be lexically scoped.

That's certainly arguable, but that's why I like real-code driven
design:  abstract arguments never end, and often yield a dubious
in-real-life outcome after one side is worn out and the other side
"wins" by attrition ;-)


> The purpose of 'local' would be to allow you to use a name that _might_ be
> used elsewhere.
>
> The problem with a dynamic scope is that you might call some global function
> from within the local scope, but find that it's "not working correctly"
> because you've inadvertently shadowed a name that the function refers to.

Already explained at excessive length that there's nothing akin to
"dynamic scopes" here, except that both happen to restore a previous
binding at times.  That's a shallow coincidence.  It's no more
"dynamic scope" than that

savea = a
try:
a += 1
f(a)
finally:
a = savea

is "dynamic scoping".  It's merely saving/restoring a binding across a
block of code.


> Imagine, in a local scope, that you call a global function that calls 'len',
> but you've shadowed 'len'...

I'm not clear on whether you picked the name of a builtin to make a
subtle point not spelled out, but I don't think it matters.
Regardless of whether `len` refers to a builtin or a module global
inside your global function now, the _current_

def f():
 len = 12
 global_function()

has no effect at all on the binding of `len` seen inside
`global_function`.  Because my understanding of "local:" changes
absolutely nothing about Python's current scope rules, it's
necessarily the case that the same would be true in:

def f():
local len:
len = 12
call_something()

The only difference from current semantics is that if

print(len)

were added after the `local:` block, UnboundLocalError would be raised
(restoring the state of the function-local-with-or-without-'local:'
`len` to what it was before the block).

To have "local:" mean "new nested lexical scope" instead requires
specifying a world of semantics that haven't even been mentioned yet.

In Python today, in the absence of `global` and `nonlocal`
declarations, the names local to a given lexical scope are determined
entirely by analyzing binding sites.  If you intend something other
than that, then it needs to be spelled out.  But if you intend to keep
"and names appearing in binding sites are also local to the new
lexical scope", I expect that's pretty much useless.   For example,

def f():
...
local x. y:
x = a*b
y = a/b
r1, r2 = x+y, x-y

That is, the programmer surely doesn't _intend_ to throw away r1 and
r2 when the block ends.  If they have to add a

nonlocal r1, r2

declaration at the top of the block, maybe it would work as intended.
But it still wouldn't work unless `r1` and `r2` _also_ appeared in
binding sites in an enclosing lexical scope.  If they don't, you'd get
a compile-time error like

SyntaxError: no binding for nonlocal 'r1' found

To be more accurate, the message should really say "sorry, but I have
no idea in which scope you _intend_ 'r1' to live, because the only way
I could know that is to find a binding site for 'r1', and I can't find
any except inside _this_ scope containing the 'nonlocal'".  But that's
kind of wordy ;-)

If you agree that makes the feature probably unusable, you don't get
off the hook by saying "no, unlike current Python scopes, binding
sites have nothing to do with what's local to a new lexical scope
introduced by 'local:'".  The same question raised in the example
above doesn't go away:  in which scope(s) are 'r1' and 'r2' to be
bound?

There's more than one plausible answer to that, but in the absence of
real use cases how can they be judged?

Under "'local:' changes nothing at all about Python's scopes", the
answer is obvious:  `r1` and `r2` are function locals (exactly the
same as if "local:" hadn't been used).  There's nothing new about
scope to learn, and the code works as intended on the first try ;-)
Of course "local:" would be a misleading name for the construct,
though.

Going back to your original example, where a global (not builtin)
"len" was intended:

def f():
global len  # LINE ADDED HERE
local len:
len = 12
global_function()

yes, in _that_ case the global-or-builtin "len" seen inside
`global_function` would change under my "nothing about scoping
changes" reading, but would not under your reading.

That's worth _something_ ;-)  But without fleshing out the rules for
all the other stuff (like which scope(s) own r1 and r2 in the example
above) I can't judge whether it's worth enough to care.  All the
plausibly realistic use cases I've considered don't _really_ want a
full-blown new scope (just robust save/restore for a handful of
explicitly given names), and the example just above is contrived in
comparison.  Nobody types "global len" unless they 

Re: [Python-ideas] A "local" pseudo-function

2018-05-01 Thread Tim Peters
[MRAB]
>>> Imagine renaming the specified names that are declared 'local'
>>> throughout the nested portion:
>>>
>>> def f():
>>> a = 10
>>> local local_a:
>>> def showa():
>>> print("a is", local_a)
>>> showa() # Raises NameError in showa because nothing bound to
>>> local_a yet.
>>> local_a = 20
>>> showa() # 20
>>> local_a = 30
>>> showa() # Raises NameError in f because local_a not defined.

[Tim]
>> In a later message I showed executable-today Python code that I
>> believe models your intent.  That code does indeed display "30" for
>> the last call, and while I myself don't see that it's _useful_ yet,
>> the model is incomprehensible if it doesn't.
>>
>> It doesn't matter that local_a isn't defined at function scope,
>> because showa() refers to the locals _in_ the "local a:" scope.  The
>> last value bound to local_a was 30, so that's what showa() needs to
>> show.  That the name `showa` is _bound_ in f's locals has nothing to
>> do with whose locals showa() _uses_.  Other current messages about
>> "closures" go on more about that.

[MRAB]
> Ah, yes. I see what you mean.
>
> There's another question that hasn't been asked yet: what should locals()
> and globals() return?

Under my meaning, the same answer as always:  exactly the same as if
"local a:' were replaced by "if True:" ;-)

Under yours, you're not going to like the answer.  It seems obvious
that since globals() refers to the module dict, nothing about that
would change.  You can add prints to my
workalike-code-under-current-Python snippet to see what locals()
_does_ return in various places.

It's not what you'd expect (that locals() inside the `local a:` block
is a dict with only key "local_a"), because of what I believe are
undocumented implementation details.  This is much easier to see in a
bare-bones example:

def f():
b = 12
def g():
nonlocal b
b = 17
a = 42
print("inner", locals())
g()
print("outer", locals())
f()

The output:

inner {'a': 42, 'b': 17}
outer {'g': .g at 0x024E28C44AE8>, 'b': 17}

That is, despite that `b` is explicitly declared as `nonlocal` in the
inner function, and does in fact belong to the outer scope, it
nevertheless shows up inside the inner function's locals().

The compiler isn't confused, but `locals()` existed long before nested
scopes were added, and, e.g., no "nonlocals()` builtin was added
later.  PEP 227 (which introduced nested scopes) explicitly declined
to add one, and said "it will not be possible to gain dictionary-style
access to all visible scopes".  I was quite surprised to see "b" as a
key in the inner locals() above!

But so long as it is, any new gimmick building on the current
implementation would inherit that behavior.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-05-01 Thread Tim Peters
[MRAB]
>>> Would/should it be possible to inject a name into a local scope? You can't
>>> inject into a function scope, and names in a function scope can be
>>> determined statically (they are allocated slots), so could the same kind of
>>> thing be done for names in a local scope?

...

[Steven D'Aprano ]
> I don't know what MRAB means by "inject", but I know what *I* mean, and
> I have a real use-case for it.
>
> There is a long-running micro-optimization, often championed by Raymond,
> for avoiding slow global lookups (and even slower builtin lookups, since
> they require a global lookup to fail first) by turning them in local
> lookups at function-definition time.

Before nested scopes, it was also used to help nested functions call
each other; e.g.,

def f():
def g():
   ...

 # h needs to call g, but can't see f's locals
 def h(g=g):  # but default arg expressions can see f's locals
 g()  # works great :-)


> E.g. some methods from the random.Random class:
>
> def randrange(self, start, stop=None, step=1, _int=int):
> ...
> def _randbelow(self, n, int=int, maxsize=1<Method=_MethodType, BuiltinMethod=_BuiltinMethodType):
> ...
>
> (copied from 3.5, I don't know what the most recent version looks like)

Doesn't matter; code like that has been in the std lib since the
start, although Raymond is inordinately fond of it ;-)


> That's a nice way to make the binding:
>
> _int = int
>
> occur once only, instead of putting it inside the function body which
> then needs to be executed on ever call. Effectively it's a static
> variable for the method, one which persists from one call to the next
> without requiring re-initialisation.
>
> But it's ugly :-(

Error-prone too, because absolutely nothing stops a user from calling
these things with more positional arguments than are intended.  I
don't want to bother looking now, but there was _some_ "bug report"
complaining that one of these "default arguments" went away somewhere,
which a user was specifying intentionally to override some accidental
implementation detail.

However, while it's dead obviously "error prone" in theory, it's
remarkable how few times I've heard of anyone getting burned by it in
practice.


> The function signature is full of *implementation details* instead of
> the parameters needed for the method's interface. Look at _randbelow
> which takes one actual parameter, n, plus FIVE fake parameters, int,
> maxsize, type, Method and BuiltinMethod, none of which should ever be
> passed arguments.
>
> So when I talk about injecting values into a function, that is the sort
> of thing I'm referring to: at function definition time, push or inject a
> reference to a known value (say, the builtin int) into something which
> behaves as a static variable.
>
> It would be nice if we could do that without polluting the function
> signature.

I'm surprised that hasn't already been done.


> I'll admit that the name "inject" came to me when I was
> thinking of some hypothetical decorator:
>
> @inject(int=int, maxsize=1< def _randbelow(self, n):
> ...
>
> that somehow pushed, or *injected*, those bindings into the function,
> turning them into locals, but in an alternative universe where Guido
> loved making new keywords, I'd use a static initialisation block and
> stick it inside the def:
>
> def _randbelow(self, n):
> static:
> # this gets executed once only, at function definition time
> int=int
> maxsize=1< type=type
> # body of _randbelow

Blame computer scientists.  I started my paying career working on Cray
Research's Fortran compiler, a language in which NO words were
reserved.  The syntax was such that it was always possible to
determine whether a language keyword or a user-supplied name was
intended.  That worked out great.

But it seemed to require ad hoc parsing tricks.  Computer scientists
invented "general" parser generators, and then every new language
started declaring that some words were reserved for the language's
use.  That was just to make things easier for parser-generator
authors, not for users ;-)

Maybe we could steal a trick from the way the async statements ("async
def", "async for", "async with") were added without making "async" a
reserved word?  Even if user code already uses the name "static", just
as in Fortran the only way to parse

static:

that makes sense is that it's introducing a block;

USER_NAME:

as a statement has no meaning.  In any other context, the
block-opening meaning of "static" makes no sense, so it must mean a
user-defined name then.  We could also add, e.g.,

MRAB local a, b, c:

and

 Uncle Timmy not really local at all a, b, c:

without introducing any actual ambiguity in code already using any of
the leading words on those lines.

It would be fun to 

Re: [Python-ideas] A "local" pseudo-function

2018-05-02 Thread Tim Peters
[MRAB]
>> There's another question that hasn't been asked yet: what should locals()
>> and globals() return?

[Tim, "globals()" is obvious, "locals()" can be surprising now]
> ...

And here recording the results of some code spelunking   Dicts don't
really have anything to do with how locals are implemented anymore;
calling "locals()" now inside a function _constructs_ a dict on the
fly out of lower-level implementation gimmicks, to be kinda-compatible
with what locals() returned before nested scopes were introduced.

The real distinctions of interest are recorded in the code object now,
under these attributes, where I'm giving a fuller explanation than can
be found in the docs, and where "referenced" doesn't distinguish
between "binding merely retrieved" and "binding is changed":

1. co_varnames:  tuple of names of locals not referenced in an
enclosed local scope

2. co_cellvars:  tuple of names of locals that are referenced in an
enclosed local scope

3 .co_freevars:  tuple of names local to an enclosing local scope and
referenced in this enclosed code

"locals()" now generally builds a dict out of all three of those name sources.

#1 is the only one that made sense before nested scopes were introduced.

The union of #1 and #2 is the set of names local to the code's scope;
CPython implements their bindings in different ways, so needs to
distinguish them.

The names in #3 aren't local to the code block at all (they are local
to some enclosing local scope), but for whatever reason are included
in the code's "locals()" dict anyway.  I would not have included them.

For concreteness:

def disp(func):
print("for", func.__name__)
code = func.__code__
for attr in "co_varnames", "co_cellvars", "co_freevars":
print("   ", attr, getattr(code, attr))

def outer():
outer_only = 1
outer_used_inner = 2
outer_bound_by_inner = 3
def inner():
nonlocal outer_bound_by_inner
inner_only = 4
outer_bound_by_inner = 5
inner_only = outer_used_inner
inner()
disp(outer)
disp(inner)
outer()

And its output:

for outer
co_varnames ('outer_only', 'inner')
co_cellvars ('outer_bound_by_inner', 'outer_used_inner')
co_freevars ()
for inner
co_varnames ('inner_only',)
co_cellvars ()
co_freevars ('outer_bound_by_inner', 'outer_used_inner')
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-30 Thread Tim Peters
[MRAB ]
> ...
> The intention is that only the specified names are local.
>
> After all, what's the point of specifying names after the 'local' if _any_
> binding in the local scope was local?

Don't look at me ;-)  In the absence of use cases, I don't know which
problem(s) you're trying to solve.  All the use cases I've looked at
are adequately addressed by having some spelling of "local:" change
nothing at all about Python's current scope rules.  If you have uses
in mind that require more than just that, I'd need to see them.

>> ...
>> If you agree that makes the feature probably unusable, you don't get
>> off the hook by saying "no, unlike current Python scopes, binding
>> sites have nothing to do with what's local to a new lexical scope
>> introduced by 'local:'".  The same question raised in the example
>> above doesn't go away:  in which scope(s) are 'r1' and 'r2' to be
>> bound?

> Any binding that's not specified as local is bound in the parent scope:

Reverse-engineering the example following, is this a fair way of
making that more precise?

Given a binding-target name N in scope S, N is bound in scope T, where
T is the closest-containing scope (which may be S itself) for which T
is either

1. established by a "local:" block that declares name N

or

2. not established by a "local: block


> local b:
> local c:
> c = 0 # Bound in the "local c" scope.

By clause #1 above, "c" is declared in the starting "local:" scope.

> b = 0 # Bound in the "local b" scope.

By clause #1 above, after searching one scope up to find `b` declared
in a "local:" scope

> a = 0 # Bound in the main scope (function, global, whatever)

By clause #2 above, after searching two scopes up and not finding any
"local:" scope declaring name "a".  By your original "the parent
scope", I would have expected this be bound in the "local b:" scope
(which is the parent scope of the "local c:" scope).

So that's _a_ possible answer.  It's not like the scoping rules in any
other language I'm aware of, but then Python's current scoping rules
are unique too.

Are those useful rules?  Optimal?  The first thing that popped into
your head?  The third?  Again I'd need to see use cases to even begin
to guess.

I agree it's well defined, though, and so miles ahead of most ideas ;-)

...

>> Note:  most of this doesn't come up in most other languages because
>> they require explicitly declaring in which scope a name lives.
>> Python's "infer that in almost all cases instead from examining
>> binding sites" has consequences.

> Would/should it be possible to inject a name into a local scope? You can't
> inject into a function scope, and names in a function scope can be
> determined statically (they are allocated slots), so could the same kind of
> thing be done for names in a local scope?

Sorry, I'm unclear on what "inject a name into a local scope" means.
Do you mean at runtime?

In Python's very early days, all scope namespaces were implemented as
Python dicts, and you could mutate those at runtime any way you liked.
Name lookup first tried the "local" dict ("the" because local scopes
didn't nest), then the "global" dict, then the "builtin" dict.  Names
could be added or removed from any of those at will.

People had a lot of fun playing with that, but nobody seriously
complained as that extreme flexibility was incrementally traded away
for faster runtime.

So now take it as given that the full set of names in a local scope
must be determinable at compile-time (modulo whatever hacks may still
exist to keep old "from module import *" code working - if any still
do exist).  I don't believe CPython has grown any optimizations
preventing free runtime mutation of global (module-level) or builtin
namespace mappings, but I may be wrong about that.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/



Re: [Python-ideas] A "local" pseudo-function

2018-04-30 Thread Tim Peters
[MRAB]
>> Any binding that's not specified as local is bound in the parent scope:

[Tim]
> Reverse-engineering the example following, is this a fair way of
> making that more precise?
>
> Given a binding-target name N in scope S, N is bound in scope T, where
> T is the closest-containing scope (which may be S itself) for which T
> is either
>
> 1. established by a "local:" block that declares name N
>
> or
>
> 2. not established by a "local: block

Here's an example where I don't know what the consequences of "the
rules" should be:

def f():
a = 10
local a:
def showa():
print("a is", a)
showa() # 10
a = 20
showa() # 20
a = 30
showa() # 10

The comments show what the output would be under the "nothing about
scope rules change" meaning.  They're all obvious (since there is is
no new scope then - it's all function-local).

But under the other meaning ...?

The twist here is that `def` is an executable statement in Python, and
is a "binding site" for the name of the function being defined.  So
despite that `showa` appears to be defined in a new nested lexical
scope, it's _actually_ bound as a function-local name.  That's bound
to be surprising to people from other languages:  "I defined it in a
nested lexical scope, but the name is still visible after that scope
ends?".

I don't know what the first `showa()` is intended to do.  Presumably
`a` is unbound at the start of the new nested scope?  So raises
NameError?  If so, comment that line out so we can make progress ;-)

It seems clear that the second `showa()` will display 20 under any reading.

But the third?  Now we're out of the `local a:` scope, but call a
function whose textual definition was inside that scope.  What does
`showa()` do now to find a's value?  f's local `a` had nothing to do
with the `a` in the nested scope, so presumably it shouldn't display
10 now.  What should it do?
Does the final state of the nested scope's locals need to preserved so
that showa() can display 30 instead?  Or ...?

Not necessarily complaining  - just fleshing out a bit my earlier
claim that a world of semantics need to be defined if anything akin to
a "real scope" is desired.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-29 Thread Tim Peters
[Ethan Furman ]
> If we need a sublocal scope, I think the most Pythonic* route to have it
> would be:
>
> with sublocal():
> blah blah
>
> which would act just like local/global does now:
>
>   - any assignment creates a new variable
> - unless that variable has been declared global/nonlocal
>   - plain reads (no assignment ever happens) refer to
> nonlocal/global/built-in names
> ...

As covered most recently in an exchange with Tim Delaney, best I can
tell absolutely nobody has wanted that.  By "sublocal scope" they
don't mean a full-fledged new scope at all, but a kind of limited
"shadowing" of a handful of specific, explicitly given names.  It acts
like a context manager, if there were a way to clearly spell

save the current state of these specific identifiers at the start (& I
couldn't care less whether they're local, nonlocal, or global - I
don't know & don't care)

then execute the code exactly as if this gimmick had never been used

then, at the end, restore the specific identifier states we saved
at the start

It's the same kind of shadowing Python already does by magic for, e.g., `i`, in

[i for i in range(3)]

So, e.g.,

"""
a = 42

def showa():
print(a)

def run():
global a

local a: # assuming this existed
a = 43
showa()
showa()
"""

would print 43 and then 42.  Which makes "local a:" sound senseless on
the face of it ;-)  "shadow" would be a more descriptive name for what
it actually does.

> ...
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-29 Thread Tim Peters
[Tim]
>> Then `c` is 12, but `a` is still 1 and `b` is still 2.  Same thing in the 
>> end:
>>
>> c = local(a=3, b=4, a*b)

[Nikolaus Rath ]
> I think this can be done already with slighly different syntax:
>
> c = (lambda a=3, b=4: a*b)()
>
> The trailing () is a little ugly, but the semantics are much more
> obvious.

But also broken, in a way that can't be sanely fixed.  Covered before
in other messages.  Short course:

>>> a = 10
>>> b = 20
>>> (lambda a=3, b=a+1: (a, b))()
(3, 11)

This context really demands (3, 4) instead.  In Scheme terms, Python's
lambda default arguments do "let" binding ("all at once"), but "let*"
binding is what's needed ("one at a time, left to right, with bindings
already done visible to later bindings").

Of course in Scheme you explicitly type either "let" or "let*" (or
"letrec", or ...) depending on what you want at the time, but "let*"
is overwhelmingly what's wanted when it makes a difference (in the
example at the top, it makes no difference at all).  Otherwise you
can't build up a complex result from little named pieces that may
depend on pieces already defined.  See,.e.g, the quadratic equation
example in the original post, where this was implicit.

> ...
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-29 Thread Tim Peters
[Tim]
>> ...
>> This is the kind of code about which there have been background
>> complaints "forever":
>>
>>  m1 = regexp1.match(line)
>>  m2 = regexp2.match(iine)
>>  if m1 and m2:
>>  do all sorts of stuff with m1 and/or m2,
>>  including perhaps modifying local variables
>>  and/or global variables
>>  and/or nonlocal variables
>>
>> The complaints are of two distinct kinds:
>>
>> 1. "I want to compute m1 and m2 _in_ the `if` test".
>>
>> 2. "I don't want these temp names (m1 and m2) accidentally
>> conflicting with local names already in scope - if these names
>> already exist, I want the temp names to shadow their
>> current bindings until the `if` structure is done".
>>
>> So,
>>
>>  if local(m1=regexp1.match(line),
>>m2 = regexp2.match(iine),
>>m1 and m2):
>>
>> intends to address both complaints via means embarrassingly obvious to
>> the most casual observer ;-)


[MRAB ]
> How about these:
>
> local m1, m2:
> m1 = regexp1.match(line)
> m2 = regexp2.match(line):
> if m1 and m2:
> ...
>
>
> local m1, m2:
>
> if (m1 := regexp1.match(line)) and (m2 := regexp2.match(line)):
>
> ...
>
> local m1=regexp1.match(line), m2=regexp2.match(line):
> if m1 and m2:


They address complaint #2 in what seems to me a thoroughly Pythonic
(direct, transparent, no more magical than necessary, easy to read)
way.  They don't address complaint #1 at all, but as you've shown (in
the 2nd spelling)  that isn't _inherently_ tied to complaint #2
(complaint #1 is what PEP 572 addresses).

So _if_ PEP 572 is accepted, adding this form of a compound `local`
statement too would address both of the listed complaints, at the
"cost" of a bit more typing and adding a level of indentation.
Neither of which bother me ;-)

`local()` itself was also intended to address the
even-more-in-the-background recurring desires for an expression (as
opposed to statement) oriented way to use throwaway bindings; e.g.,
instead of

temp = x + y - z + 1
r = temp**2 - 1/temp

this instead:

r = local(t=x + y - z + 1, t**2 - 1/t)

It's not an accident that the shorter `t` is used in the latter than
the former's `temp`:  when people are wary of clobbering names by
accident, they tend to use longer names that say "I'm just a temp -
please don't _expect_ my binding to persist beyond the immediate uses
on the next few lines":.

Anyway, that kind of thing is common n functional languages, where

"let" pile-of-bindings "in" expression

kinds of constructs are widely used _as_ (sub)expressions themselves.

local t = x + y - z + 1:
r = t**2 - 1/t

would be the same semantically, but they'd still complain about the
"extra" typing and the visual "heaviness" of introducing a block for
what they _think_ of as being "just another kind of expression".

The `local()` I brought up was, I think, far too biased _toward_ that
use.  It didn't "play nice" with block-oriented uses short of
excruciatingly deep magic.  Your `local` statement is biased in the
other direction, but that's a Good Thing :-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-29 Thread Tim Peters
[Tim]
 Then `c` is 12, but `a` is still 1 and `b` is still 2.  Same thing in the 
 end:

 c = local(a=3, b=4, a*b)

[Nikolaus Rath ]
>>> I think this can be done already with slighly different syntax:
>>>
>>> c = (lambda a=3, b=4: a*b)()
>>>
>>> The trailing () is a little ugly, but the semantics are much more
>>> obvious.

[Tim]
>> But also broken, in a way that can't be sanely fixed.  Covered before
>> in other messages.  Short course:
>>
>> >>> a = 10
>> >>> b = 20
>> >>> (lambda a=3, b=a+1: (a, b))()
>> (3, 11)
>>
>> This context really demands (3, 4) instead.  In Scheme terms, Python's
>> lambda default arguments do "let" binding ("all at once"), but "let*"
>> binding is what's needed ("one at a time, left to right, with bindings
>> already done visible to later bindings").

[Chris Angelico ]
> So maybe the effective semantics should be:
>
> >>> (lambda a=3: (lambda b=a+1: (a, b))())()
> (3, 4)

Almost, but by that point the idea that this is already "easily
spelled" via lambdas has become ludicrously difficult to argue with a
straight face ;-)

By "almost", I mean there are other cases where even nesting Python
lambdas doesn't capture the intent.  In these cases, not only does the
expression defining b refer to a, but _also_ the expression defining a
refers to b.

You can play, if you like, with trying to define the `iseven` lambda
here in one line by nesting lambdas to define `even` and `odd` as
default arguments:

even = (lambda n: n == 0 or odd(n-1))
odd = (lambda n: False if n == 0 else even(n-1))
iseven = lambda n: even(n)

Scheme supplies `letrec` for when "mutually recursive" bindings are
needed.  In Python that distinction isn't nearly as evidently needed,
because Python's idea of closures doesn't capture all the bindings
currently in effect,.  For example, when `odd` above is defined,
Python has no idea at all what the then-current binding for `even` is
- it doesn't even look for "even" until the lambda is _executed_.

But, to be fair, I'm not sure:

iseven =- local(
even = (lambda n: n == 0 or odd(n-1)),
odd = (lambda n: False if n == 0 else even(n-1)).
lambda n: even(n))

would have worked either.  At the moment I'm certain it wouldn't.
Last night I was pretty sure it would ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-05-01 Thread Tim Peters
[Tim]
> ...
> Here's an example where I don't know what the consequences of "the
> rules" should be:
>
> def f():
> a = 10
> local a:
> def showa():
> print("a is", a)
> showa() # 10
> a = 20
> showa() # 20
> a = 30
> showa() # 10
>
> The comments show what the output would be under the "nothing about
> scope rules change" meaning.  They're all obvious (since there is is
> no new scope then - it's all function-local).
>
> But under the other meaning ...?
>
> The twist here is that `def` is an executable statement in Python, and
> is a "binding site" for the name of the function being defined.  So
> despite that `showa` appears to be defined in a new nested lexical
> scope, it's _actually_ bound as a function-local name.  That's bound
> to be surprising to people from other languages:  "I defined it in a
> nested lexical scope, but the name is still visible after that scope
> ends?".
>
> I don't know what the first `showa()` is intended to do.  Presumably
> `a` is unbound at the start of the new nested scope?  So raises
> NameError?  If so, comment that line out so we can make progress ;-)
>
> It seems clear that the second `showa()` will display 20 under any reading.
>
> But the third?  Now we're out of the `local a:` scope, but call a
> function whose textual definition was inside that scope.  What does
> `showa()` do now to find a's value?  f's local `a` had nothing to do
> with the `a` in the nested scope, so presumably it shouldn't display
> 10 now.  What should it do?
> Does the final state of the nested scope's locals need to preserved so
> that showa() can display 30 instead?  Or ...?

Just noting that a sane way to answer such questions is to model the
intent with nested functions in current Python.  It's annoying because
you have to explicitly declare the _non_-local names instead, and also
make dummy assignments to those names to tell Python in which scope
they _should_ be bound.

So, e.g., for the above you can write this:


def f():
a = 10

# Have to give `showa` _some_ binding in its intended scope,
# else the "nonlocal showa" below is a compile-time error
showa = None
def local_a():
nonlocal showa
def showa():
print("a", a)
#showa() # raises NameError
a = 20
showa() # 20
a = 30
local_a()
del local_a

showa() # 30
print("but f's a is", a) # 10

The comments show what happens when you call `f()`, matching the
guesses in the original message.

One thing to note:  if this does model the intent, it's essentially
proof that it's implementable without intractable effort; e.g., the
machinery needed to implement funky closures already exists.

But another:  since I've never seen code anything like this before,
that casts doubt on whether the semantics are actually useful in real
life.

That's why I always ask for real use cases ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] A "local" pseudo-function

2018-04-29 Thread Tim Peters
[David Mertz ]
> Ooops. My proof on anti-concept has a flaw.  It only "shadows" names that
> already exist.  Presumably that's the wrong idea, but it's easy enough to
> change if desired.

Even in the very early days when Python's runtime was more
relentlessly simple-minded than it is now, these kinds of things were
always hard to get right in all cases.  But it's heartening to see
_someone_ still has courage enough to leap into the void ;-)

I see that the docs for `locals()` now say:

The contents of this dictionary should not be modified; changes
may not affect the values of local and free variables used by
the interpreter.

I thought that explained something I was seeing, but the problem
turned out to be more obvious:  the "locals()" inside your
"sublocal()" context manager refer to _sublocal_'s own locals, not to
the locals of the code invoking `sublocal()`..  So, e.g., run this:

def run():
a = 1
b = 2

def g(tag):
print(f"{tag}: a {a} b {b}")

with sublocal(a=6):
g("first in block")
a = 5
g("set a to 5")
b = 19
g("set b to 19")
g("after")

Here's output:

first in block: a 1 b 2  # the `a=6` had no visible effect
set a to 5: a 5 b 2  # golden
set b to 19: a 5 b 19  # also golden
after: a 5 b 19  # `a` wasn't restored to 1

To be very clear, the output is the same as if the `with` statement
were replaced with `if True:`.

But even if we crawled up the call stack to get the right locals()
dict, looks like `exec` is no longer enough (in Python 3) to badger
the runtime into making it work anyway:

https://bugs.python.org/issue4831

"""
> Specifically, what is the approved way to have exec() modify the local
> environment of a function?

There is none.  To modify the locals of a function on the fly is not
possible without several consequences: normally, function locals are not
stored in a dictionary, but an array, whose indices are determined at
compile time from the known locales.  This collides at least with new
locals added by exec.  The old exec statement circumvented this, because
the compiler knew that if an exec without globals/locals args occurred
in a function, that namespace would be "unoptimized", i.e. not using the
locals array.  Since exec() is now a normal function, the compiler does
not know what "exec" may be bound to, and therefore can not treat is
specially.
"""

Worm around that (offhand I don't know how), and there are nonlocal
names too.  I don't know whether Python's current introspection
features are enough to even find out which nonlocals have been
declared, let alone to _which_ scope each nonlocal belongs.

Worm around that too, then going back to the example at the top, if
the manager's

locals().update(_locals)

had the intended effect, it would end up restoring `b` to 2 too, yes?
The only names that "should be" restored are the names in the `kws`
dict.

So, in all, this may be a case where it's easier to implement in the
compiler than to get working at runtime via ever-more-tortured Python
code.

And when that's all fixed, "a" can appear in both locals() and
globals() (not to mention also in enclosing scopes), in which case
what to do is unclear regardless how this is implemented.  Which
"a"(s) did the user _intend_ to shadow?

The fun never ends ;-)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


  1   2   3   >