from:"Andrew Barnert via Python\-ideas"

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-16 Thread Andrew Barnert via Python-ideas

On May 15, 2020, at 21:35, Steven D'Aprano  wrote:
> 
> On Fri, May 15, 2020 at 05:44:59PM -0700, Andrew Barnert wrote:
> 
>> Once you go with a separate view-creating function (or type), do we even 
>> need the dunder?
> 
> Possibly not. But the beauty of a protocol is that it can work even if 
> the object doesn't define a `__view__` dunder.

Sure, but if there’s no good reason for any class to provide a __view__ dunder, 
it’s better not to call one.

Which is why I asked—in the message you’re replying to—a bunch of questions to 
try to determine whether there’s any reason for a class to want to provide an 
override. I’m not going to repeat the whole thing here; it’s all still in that 
same message you replied to.

> - If the object defines `__view__`, call it; this allows objects to 
> return an optimized view, if it makes sense to them; e.g. bytes 
> might simply return a memoryview.

Not if memoryview doesn’t have the right API, as we discussed earlier in this 
thread.

But more importantly, if it’s only builtins that will likely ever need an 
optimization, we can do that inside the functions. That’s exactly what we do in 
hundreds of places already. Even the one optimization that’s exposed as part of 
the public C API, PySequence_Fast, isn’t hookable, much less all the functions 
that fast-path directly on the array in list/tuple or on the split hash table 
in set/dict/dict_keys and so on. It seems to work well enough in practice, and 
it’s simpler, and faster for the builtins, and it means we don’t have hundreds 
of extra dunders (and type slots in CPython) that will almost never be used, 
and PyPy doesn’t need to write hooks that are actually pessimizations just 
because they’re optimizations in CPython, and so on.

Of course there might be a reason that doesn’t apply in this case (there 
obviously is a good reason for non-builtin types to optimize __contains__, for 
example), but “there might be” isn’t an answer to YAGNI. Especially if we can 
add the dunder later if someone later finds a need for it.

And honestly, I’m not sure even list and tuple are worth optimizing here. After 
all, you can’t do the index arithmetic and call to sq_ifem significantly faster 
than a generic C function; it only helps if you can avoid the call to sq_item, 
and I think we can’t do that in any of the most useful cases (at least not 
without patching up a whole lot more code than we want). But I’ll try it and 
see if I’m wrong.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JWBWCVKBBZMKGGMR6UQDP5ZII4NN6IWM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-16 Thread Andrew Barnert via Python-ideas

On May 15, 2020, at 21:25, Steven D'Aprano  wrote:
> 
> On Fri, May 15, 2020 at 01:00:09PM -0700, Christopher Barker wrote:
> 
>> I know you winked there, but frankly, there isn't a clear most Pythonic API
>> here. Surely you do'nt think PYhton should have no methods?
> 
> That's not what I said. Of course Python should have methods -- it's an 
> OOP language after all, and it's pretty hard to have objects unless they 
> have behaviour (methods). Objects with no behaviour are just structs.
> 
> But seriously, and this time no winking, Python's design philosophy is 
> very different from that of Java and even Ruby and protocols are a 
> hugely important part of that. Python without protocols wouldn't be 
> Python, and it would be a much lesser language.
> 
> [Aside: despite what the Zen says, I think *protocols* are far more 
> important to Python than *namespaces*.]

I agree up to this point. But what you’re missing is that Python (even with 
stdlib stuff like pickle/copy and math.floor) has only a couple dozen 
protocols, and hundreds and hundreds of methods.

Some things should be protocols, but not everything should, or even close. Very 
few things should be protocols. More to the point, things should be protocols 
if and only if they have a specific reason to be a protocol. For example:

1. You need something more complicated than just a single straightforward call, 
like the fallback behavior for __contains__ and __iter__ with “old-style 
sequences”, or the whole pickle__getnewargs_ex__ and friends, or __add__ vs. 
__radd__.

2. Syntax, especially operator overloading, like __contains__ and __add__.

3. The function is so ubiquitously important that you don’t want anything else 
using the same name for different meanings, like __len__.

(There are probably other good reasons.)

When you have a reason like this, you should design a protocol. But when you 
don’t, dot syntax is the default. And it’s not just complexity, or “too many 
builtins” (after all, pickle.dump and math.ceil aren’t builtins). It’s that dot 
syntax gives you built-in disambiguation that function call syntax doesn’t. If 
I have a sequence, xs.index(x) has an obvious meaning. But index(xs, x) would 
not, because means too many different things (in fact, we already have an 
__index__ protocol that does one of those different things), and it’s not like 
len where one of those meanings is so fundamental that we a actually want to 
discourage all the others.

As I said elsewhere, I think we probably can’t have dot syntax in this case for 
other reasons. But that _still_ doesn’t necessarily mean we need a protocol. If 
we need to be able to override behavior but we can’t have dot syntax, *that* 
might be a good reason for a protocol, but either of those on its own is not a 
good reason, only the combination.

It’s worth comparing C++, where “free functions are part of a class’s 
interface”. They don’t spell their protocols with underscores, or call them 
protocols, but they idea is all over the place. x+y tries x.operator+(y) plus 
various fallbacks. The way you get an iterator is begin(xs) which by default 
calls xs.begin() so that’s the standard place to customize it but there are 
fallbacks. Converting a C to a D tries (among other things) both C::operator 
D() and D::D(C). And so on. But, unlike Python, they don’t try to distinguish 
what is and isn’t a protocol; the dogma is basically that everything should be 
a protocol if it possibly can be. Which doesn’t work. They keep trying to solve 
the compiler-ambiguity problem by adding features like argument-dependent 
lookup, and almost adding D’s uniform call syntax every 3 years, but none of 
that will ever solve the human-ambiguity problem. Things like + and begin and 
swap belong at the top level because they should always mean the same thing 
even if they have to be implemented differently, but things like draw should be 
methods because they mean totally different things on different types, and even 
if the compiler can tell which one is meant, even if an IDE
can help you, deck.draw(5) vs. shape.draw(ctx) is still more readable than 
draw(deck, 5) vs. draw(shape, ctx). Ultimately, it’s just as bad as Java; it 
just goes too far in the opposite direction, which is still too far, and that’s 
what always happens when you’re looking for a perfect and simple dogma that 
applies to both iter and index so you never have to think about design.

> Python tends to use protocol-based top-level functions:
> 
>   len, int, str, repr, bool, iter, list
> 
> etc are all based on *protocols*, not inheritance.

> The most notable 
> counter-example to that was `iterator.next` which turned out to be a 
> mistake and was changed in Python 3 to become a protocol based on a 
> dunder.

No, the most notable counter examples are things like insert, extend, index, 
count, etc. on sequences; keys, items, update, setdefault, etc. on mappings; 
add, isdisjoint, etc. on sets; real, imag, etc. on numbers;

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-15 Thread Andrew Barnert via Python-ideas

On May 15, 2020, at 18:21, Christopher Barker  wrote:
> 
> Hmm, more thought needed.

Speaking of “more thought needed”, I took a first pass over cleaning up my 
quick slice view class and adding the slicer class, and found some 
bikesheddable options. I think in most cases the best answer is obvious, but 
I’ve been wrong before. :)

Assume s and t are Sequences of the same type, u is a Sequence or a different 
type, and vs, vt, and vu are view slices on those sequences. Also assume that 
we called the view slicer type vslice, and the view slice type SliceView, 
although obviously those are up for bikeshedding.

When s==t is allowed, is vs==vt? What about vs==t? Same for <, etc.? I think 
yes, yes, yes.

When s is hashable, is vs hashable? If so, is it the same hash an equivalent 
copy-slice would have? The answer to == constrains the answer here, of course. 
I think they can just not be hashable, but it’s a bit weird to have an 
immutable builtin sequence that isn’t. (Maybe hash could be left out but then 
added in a future version if there’s a need?)

When s+t is allowed, is vs+t? vs+vt? (Similarly when s+u is allowed, but that 
usually isn’t.) vs*3? I think all yes, but I’m not sure. (Imagine you create a 
million view slices but filter them down to just 2, and then concatenate those 
two. That makes sense, I think.)

Should there be a way to ask vs for the corresponding regular copy slice? Like 
vslice(s)[10:].strictify() == s[10:]? I’m not sure what it’s good for, but 
either __hash__ or __add__ seems to imply a private method for this, and then I 
can’t see any reason to prevent people from calling it. (Except that I can’t 
think of a good name.)

Should the underlying sequence be a public attribute? It seems easy and 
harmless and potentially useful, and memoryview has .obj (although dict views 
don’t have a public reference to the dict).

What about the original slice object? This seems less useful, since you don’t 
pass around slice objects that often. And we may not actually be storing it. 
(The simplest solution is to store slice.indices(len(seq)) instead of slice.) 
So I think no.

If s isn’t a Sequence, should vslice(s) be a TypeError. I think we want the C 
API sequence check, but not the full ABC check.

What does vslice(s)[1] do? I think TyoeError('not a slice').

Does the vslice type need any other methods besides __new__ and __getitem__? I 
don’t think so. The only use for vslice(s) besides slicing it is stashing it to 
be sliced later, just like the only use for a method besides calling it is 
stashing it to be called later. But it should have the sequence as a public 
attribute for debugging/introspection, just like methods make their self and 
function attributes public. 

Is the SliceView type public? (Only in types?) Or is “what the vslice slicer 
factory creates” an implementation detail, like list_iter. I think the latter.

What’s the repr for a SliceView? Something like vslice([1, 2, 10, 20])[::2] 
seems most useful, since that’s the way you construct it, even if it is a bit 
unusual. Although a tiny slice of a giant sequence would then have a giant repr.

What’s the str? I think same as the repr, but will people expect a view of a 
list/tuple/etc. to look “nice” like list/tuple/etc. do?

Does vs[:] return self? (And, presumably, vs[0:len(s)+100] and so on.) I think 
so, but that doesn’t need to be guaranteed (just like tuple, range, etc.).

If vs is an instance of a subclass of SliceView, is vs[10:20] a SliceView, or 
an instance of the subclass? I think the base class, just like tuple, etc.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/X45QVVPMB5JOQDKI7OEV4JAQ7WMA4XHO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-15 Thread Andrew Barnert via Python-ideas

On May 15, 2020, at 18:21, Christopher Barker  wrote:
> 
>> On Fri, May 15, 2020 at 5:45 PM Andrew Barnert  wrote:
> 
>> On May 15, 2020, at 13:03, Christopher Barker  wrote:
>> > 
>> > Taking all that into account, if we want to add "something" to Sequence 
>> > behavior (in this case a sequence_view object), then adding a dunder is 
>> > really the only option -- you'd need a really compelling reason to add a 
>> > Sequence method, and since there are quite a few folks that think that's 
>> > the wrong approach anyway, we don't have a compelling reason.
>> > 
>> > So IF a sequence_view is to be added, then a dunder is really the only 
>> > option.
>> 
>> Once you go with a separate view-creating function (or type), do we even 
>> need the dunder?
> 
> Indeed -- maybe not. We'd need a dunder if we wanted to make it an "official" 
> part of the Sequence protocol/ABC, but as you point out there may be no need 
> to do that at all.

That’s actually a what triggered this thought. We need collections.abc.Sequence 
to support the dunder with a default implementation so code using it as a mixin 
works. What would that default implementation be? Basically just a class whose 
__getitem__ constructs the thing I posted earlier and that does nothing else. 
And why would anyone want to override that default?

Being able to override dunders like __in__ and regular methods like count is 
useful for multiple reasons: a string-like class needs to extend their behavior 
for substring searching, a range-like class can implement them without 
searching at all, etc. But none of those seemed to apply to overriding 
__viewslice__ (or whatever we’d call it).

> Hmm, more thought needed.

Yeah, certainly just because I couldn’t think of a use doesn’t mean there isn’t 
one.

But if I’m right that the dunder could be retrofitted in later (I want to try 
building an implementation without the dunder and then retrofitting one in 
along with a class that overrides it, if I get the time this weekend, to verify 
that it really isn’t a problem), that seems like a much better case for leaving 
it out.

Another point: now that we’re thinking generic function (albeit maybe a C 
builtin with fast-path code for list/tuple), maybe it’s worth putting an 
implementation on PyPI as soon as possible, so we can get some experience using 
it and make sure the design doesn’t have any unexpected holes and, if we’re 
lucky, get some uptake from people outside this thread.___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VSGQLYF6B25BB6KLZALMYST7IQWMVI3I/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-15 Thread Andrew Barnert via Python-ideas

On May 14, 2020, at 20:01, Stephen J. Turnbull 
 wrote:
> 
> Executive summary:
> 
> AFAICT, my guess at what's going on in the C tokenizer was exactly
> right.  It greedily consumes as many non-operator, non-whitespace
> characters as possible, then validates.  

Well, it like like it’s not quite “non-operator, non-whitespace characters”, 
but rather “ASCII identifier or non-ASCII characters”:

>  (c >= 'a' && c <= 'z')\
>   || (c >= 'A' && c <= 'Z')\
>   || c == '_'\
>   || (c >= 128))

(That’s the initial char rule; the continuing char rule is similar but of 
course allows digits.)

So it won’t treat a $ or a ^G as potentially part of an identifier, so the 
caret will show up in the right place for one of those, but it will treat an 
emoji as potentially part of an identifier, so (if that emoji is immediately 
followed by legal identifier characters, ASCII or otherwise) the caret will 
show up too far to the right.

I’m still glad the Python tokenizer doesn’t do this (because, as I said, I’ve 
relied on the documented behavior in import hooks for playing around with 
Python, and they use the Python tokenizer), but that doesn’t matter for the C 
tokenizer, because its output is not public, it’s only seen by the parser. And 
I think you can prove that the error caret placement is the only thing that 
could be affected by this shortcut.[1] And if it makes the tokenizer faster, or 
just simpler to maintain, that could easily be worth it.

(At least until one of those periodic “Python should add this Unicode operator” 
proposals actually gets some traction, but I don’t see that as likely any time 
soon.)

—-

[1] Python only allows non-ASCII characters in identifiers, strings, and 
comments. Therefore, any string of characters that should be tokenized as a 
sequence of 1 ERRORTOKEN followed by 0 or more NAME and ERRORTOKEN tokens by 
the documented rule (and the Python code) will still give you a sequence of 1 
ERRORTOKEN followed by 0 or more NAME and ERRORTOKEN tokens by the C code, just 
not necessarily the same such sequence. And any such sequence will be parsed as 
a SyntaxError pointing at the end of the initial ERRORTOKEN. So, the caret 
might be somewhere else within that block of identifier and non-ASCII 
characters, but it will be somewhere within that block.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LPLKLECRRW2UEONMN6RAROU5HKKQC6XO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-15 Thread Andrew Barnert via Python-ideas

On May 15, 2020, at 13:03, Christopher Barker  wrote:
> 
> Taking all that into account, if we want to add "something" to Sequence 
> behavior (in this case a sequence_view object), then adding a dunder is 
> really the only option -- you'd need a really compelling reason to add a 
> Sequence method, and since there are quite a few folks that think that's the 
> wrong approach anyway, we don't have a compelling reason.
> 
> So IF a sequence_view is to be added, then a dunder is really the only option.

Once you go with a separate view-creating function (or type), do we even need 
the dunder?

I’m pretty sure a generic slice-view-wrapper (that just does index arithmetic 
and delegates) will work correctly on every sequence type. I won’t promise that 
the one I posted early in this thread does, of course, and obviously we need a 
bit more proof than “I’m pretty sure…”, but can anyone think of a way a 
Sequence could legally work that would break this?

And I can’t think of any custom features a Sequence might want add to its view 
slices (or its view-slice-making wrapper).

I can definitely see how a custom wrapper for list and tuple could be faster, 
and imagine how real life code could use it often enough that this matters. But 
if it’s just list and tuple, CPython’s already full of builtins that fast-path 
on list and tuple, and there’s no reason this one can’t do the same thing.

So, it seems like it only needs a dunder if there are likely to be third-party 
classes that can do view-slicing significantly faster than a generic 
view-slicer, and are used in code where it’s likely to matter. Can anyone think 
of such a case? (At first numpy seems like an obvious answer. Arrays aren’t 
Sequences, but I think as long as the wrapper doesn’t actually type-check that 
at __new__ time they’d work anyway. But why would anyone, especially when they 
care about speed, use a generic viewslice function on a numpy array instead of 
just using numpy’s own view slicing?)

It seems like a dunder is something that could be added as a refinement in the 
next Python version, if it turns out to be needed. If so, then, unless we have 
an example in advance to disprove the YAGNI presumption, why not just do it 
without the dunder?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G3L6NP4PWPR2O2VSVXGGJNALYECKDG5G/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-15 Thread Andrew Barnert via Python-ideas

On May 15, 2020, at 03:50, Steven D'Aprano  wrote:
> 
> On Thu, May 14, 2020 at 09:47:36AM -0700, Andrew Barnert wrote:
>>> On May 14, 2020, at 03:01, Steven D'Aprano  wrote:
>>> 
>> Which is exactly why Christopher said from the start of this thread, 
>> and everyone else has agreed at every step of the way, that we can’t 
>> change the default behavior of slicing, we have to instead add some 
>> new way to specifically ask for something different.
> 
> Which is why I was so surprised that you suddenly started talking about 
> not being able to insert into a slice of a list rather than a view.

We’re talking about slice views. The sentence you quoted and responded to was 
about the difference between a slice view from a list and a slice view from a 
string. A slice view from a list may or may not be the same type as a slice 
view from a tuple (I don’t think there’s a reason to care whether they are or 
not), but either way, it being immutable will, I think, not surprise anyone. By 
contrast, a slice view from a string being not stringy _might_ surprise someone.

>> Not only that, but whatever gives 
>> you view-slicing must look sufficiently different that you notice the 
>> difference—and ideally that gives you something you can look up if you 
>> don’t know what it means. I think lst.view[10:20] fits that bill.
> 
> Have we forgotten how to look at prior art all of a sudden? Suddenly 
> been possessed by the spirits of deceased Java and Ruby programmers 
> intent on changing the look and feel of Python to make it "real object 
> oriented"? *wink*

No, we have remembered that language design is not made up of trivial rules 
like “functions good, methods bad”, but of understanding the tradeoffs and how 
they apply in each case. 

> We have prior art here:
> 
>b'abcd'.memoryview  # No, not this.
>memoryview(b'abcd')  # That's the one.

>'abcd'.iter  # No, not that either.
>iter('abcd')  # That's it
> 
> In fairness, I do have to point out that dict views do use a method 
> interface,

This is a secondary issue that I’ll come back to, but first: the whole thing 
that this started off with is being able to use slicing syntax even when you 
don’t want a copy.

The parallel to the prior art is obvious:

itertools.islice(seq, 10, 20) # if you don’t care about iterator or view
sliceviews.slice(seq, 10, 20) # if you do

The first one already exists. The second one takes 15 lines of code, which I 
slapped together and posted near the start of the thread.

The only problem is that they don’t solve the problem of “use slicing syntax”. 
But if that’s the entire point of the proposal (at least for Chris), that’s a 
pretty big problem.

Now, as we’d already been discussing (and as you quoted), you _could_ have a 
callable like this:

viewslice(seq)[10:20]

I can write that in only a few more lines than what I posted before, and it 
works. But it’s no longer parallel to the prior art. It’s not a function that 
returns a view, it’s a wrapper object that can be sliced to provide a view. 
There are pros and cons of this wrapper object vs. the property, but a false 
parallel with other functions is not one of them.

> 1. Dict views came with a lot of backwards-compatibility baggage; 
> they were initially methods that returned lists; then methods 
> that returned iterators were added, then methods that returned 
> views were added, and finally in 3.x the view methods were renamed and 
> the other six methods were removed.

This is, if anything, a reason they _shouldn’t_ have been methods. Changing the 
methods from 2.6 to 2.7 to 3.x, and in a way that tools like six couldn’t even 
help without making all of your code a bit uglier, was bad, and wouldn’t have 
been nearly as much of a problem if we’d just made them all functions in 2.6.

And yet, the reasons for them being methods were compelling enough that they 
remain methods in 3.x, despite that problem. That’s how tradeoffs work.

> 2. There is only a single builtin mapping object, dict, not like 
> sequences where there are lists, tuples, range objects, strings, byte 
> strings and bytearrays.

Well. there’s also mappingproxy, which is a builtin even if its name is only 
visible in types. And there are other mappings in the stdlib, as well as 
popular third-party libraries like SortedContainers. And they all support these 
methods. There are some legacy third-party libraries never fully updated for 
3.x still out there, but they don’t meet the Mapping protocol or its ABC.

So, how does this distinction matter?

Note that there is a nearly opposite argument for the wrapper object that 
someone already made that both seem a lot compelling to me: third-party types. 
We can’t change them overnight. And some of them might already have an 
attribute named view, or anything else we might come up with. Those are real 
negatives with the property design, in a way that “more of the code we _can_ 
easily change is in the Objects rather than Lib

[Python-ideas] Re: Documenting iterators vs. iterables [was: Adding slice Iterator ...]

2020-05-15 Thread Andrew Barnert via Python-ideas

On May 14, 2020, at 20:17, Stephen J. Turnbull 
 wrote:
> 
> Andrew Barnert writes:
> 
>> Students often want to know why this doesn’t work:
>>   with open("file") as f:
>>   for line in file:
>>   do_stuff(line)
>>   for line in file:
>>   do_other_stuff(line)
> 
> Sure.  *Some* students do.  I've never gotten that question from mine,
> though I do occasionally see
> 
>   with open("file") as f:
>   for line in f:# ;-)
>   do_stuff(line)
>   with open("file") as f:
>   for line in f:
>   do_other_stuff(line)
> 
> I don't know, maybe they asked the student next to them. :-)

Or they got it off StackOverflow or Python-list or Quora or wherever. Those 
resources really do occasionally work as intended, providing answers to people 
who search without them having to ask a duplicate question. :)

>> The answer is that files are iterators, while lists are… well,
>> there is no word.
> 
> As Chris B said, sure there are words:  File objects are *already*
> iterators, while lists are *not*.  My question is, "why isn't that
> instructive?"

Well, it’s not _completely_ not instructive, it’s just not _sufficiently_ 
instructive.

Language is more useful when the concepts it names carve up the world in the 
same way you usually think about it.

Yes, it’s true that we can talk about “iterables that are not iterators”. But 
that doesn’t mean there’s no need for a word. We don’t technically need the 
word “liquid” because we could always talk about “compressibles that are not 
solid” (or “fluids that are not gas”); we don’t need the word “bird” because we 
could always talk about “diapsids that are not reptiles”; etc. Theoretically, 
English could express all the same propositions and questions and so on that it 
does today without those words. But practically, it would be harder to 
communicate with. And that’s why we have the words “bird” and “liquid”. And the 
reason we don’t have a word for all diapsids except birds and turtles is that 
we don’t need to communicate about that category. 

Natural languages get there naturally; jargon sometimes needs help.

>> We shouldn’t define everything up front, just the most important
>> things. But this is one of the most important things. People need
>> to understand this distinction very early on to use Python,
> 
> No, they don't.  They neither understand, nor (to a large extent) do
> they *need* to.

> ISTM that all we need to say is that
> 
> 1.  An *iterator* is a Python object whose only necessary function is
>   to return an object when next is applied to it.  Its purpose is to
>   keep track of "next" for *for*.  (It might do other useful things
>   for the user, eg, file objects.)
> 
> 2.  The *for* statement and the *next* builtin require an iterator
>   object to work.  Since for *always* needs an iterator object, it
>   automatically converts the "in" object to an iterator implicitly.
>   (Technical note: for the convenience of implementors of 'for',
>   when iter is applied to an iterator, it always returns the
>   iterator itself.)

I think this is more complicated than people need to know, or usually learn. 
People use for loops almost from the start, but many people get by with never 
calling next. All you need is the concept “thing that can be used in a for 
loop”, which we call “iterable”. Once you know that, everything else in Python 
that loops is the same as a for loop—the inputs to zip and enumerate are 
iterables, because they get looped over.

“Iterable” is the fundamental concept. Yeah, it sucks that it has such a clumsy 
word, but at least it has a word.

You don’t need the concept “iterator” here, much less need to know that looping 
uses iterables by calling iter() to get an iterator and then calling next() 
until StopIteration, until you get to the point of needing to read or write 
some code that iterates manually.

Of course you will need to learn the concept “iterator” pretty soon anyway, but 
only because Python actually gives you iterators all over the place. In a 
language (like Swift) where zip and enumerate were views, files weren’t 
iterable at all, etc., you wouldn’t need the concept “iterator” until very 
late, but in Python it shows up early. But you still don’t need to learn about 
next(); that’s as much a technical detail as the fact that they return self 
from iter(). You want to know whether they can be used in for loops—and they 
can, because (unlike in Swift) iterators are iterable, and you already 
understand that.

> 3.  When a "generic" iterator "runs out", it's exhausted, it's truly
>   done.  It is no longer useful, and there's nothing you can do but
>   throw it away.  Generic iterators do not have a reset method.
>   Specialized iterators may provide one, but most do not.

Yes, this is the next thing you need to know about iterators.

But you also need to know that many iterables don’t get consumed in this way. 
Lists, ranges, dicts, etc. do _not_ run out when you use

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-14 Thread Andrew Barnert via Python-ideas

On May 14, 2020, at 11:53, Ricky Teachey  wrote:
> 
>> So that means a view() function (with maybe a different name) -- however, 
>> that brings up the issue of where to put it. I'm not sure that it warrants 
>> being in builtins, but where does it belong? Maybe the collections module? 
>> And I really think the extra import would be a barrier.
>> 
> 
> It occurs to me-- and please quickly shut me down if this is a really dumb 
> idea, I won't be offended-- `memoryview` is already a top-level built-in. I 
> know it has a near completely different meaning with regards to bytes objects 
> than we are talking about with a sequence view object. But could it do double 
> duty as a creator of views for sequences, too?

But bytes and bytearray are Sequences, and maybe other things that support the 
buffer protocol are too.

At first glance, it sounds terrible that the same function gives you a locking 
buffer view for some sequences and an indirect regular sequence view for 
others, and that there’s no way to get the latter for bytes even when you 
explicitly want that. But maybe in practice it wouldn’t be nearly as bad as it 
sounds? I don’t know. It sounds terrible in theory that NumPy arrays are almost 
but not quite Sequences, but in practice I rarely get confused by that. Maybe 
the same would be true here?

There’s also the problem that “memoryview” is kind of a misleading name if you 
apply it to, say, a range instead of a list. But again, I’m not sure how bad 
that would be in practice.___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BMB5DAW67NRODTH46NXIZ55D4VDRBO2Y/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-14 Thread Andrew Barnert via Python-ideas

On May 14, 2020, at 10:45, Rhodri James  wrote:
> 
> On 14/05/2020 17:47, Andrew Barnert via Python-ideas wrote:
>> Which is exactly why Christopher said from the start of this thread,
>> and everyone else has agreed at every step of the way, that we can’t
>> change the default behavior of slicing, we have to instead add some
>> new way to specifically ask for something different.
> 
> Erm, did someone actually ask for something different?  As far as I can tell 
> the original thread OP was asking for islice-maker objects, which don't 
> require the behaviour of slicing to change at all.  Quite where the demand 
> for slice views has come from I'm not at all clear.

That doesn’t make any difference here.

If you want slicing sequences to return iterators rather than copies, that 
would break way too much code, so it’s not going to happen. A different 
method/property/class/function that gives you iterators would be fine.

If you want slicing sequences to return views rather than copies, that would 
break way too much code, so it’s not going to happen. A different 
method/property/class/function that gives you iterators would be fine.

Which is why nobody has proposed changing what list.__getitem__, etc. will do.

As for where views came from: because they do everything iterators do plus 
things they don’t, and in this case they’re about as easy to implement.

It’s really the same thing as dict.items. People wanted a dict.items that 
didn’t copy the whole thing into a giant list. The first suggestion was for an 
iterator. But that would break too much code, so it couldn’t be done until 3.0. 
But it was still so useful that it was worth having before 3.x, so it was added 
to 2.6 with a distinct name, iteritems. But then people realized they could 
have a view just as easily as an iterator, and it would do more, so that’s what 
actually went into 3.0. And that turned out to be so useful that it was worth 
having before 3.x, so, even though iteritems had already been added in 2.6, it 
was phased out for viewitems in 2.7. 

I’m just trying to jump to the end here. Some of the issues aren’t the same 
(should it be a function or an attribute, is it worth having custom 
implementations for some builtin types, …), but some of them are, so we can 
learn from the past instead of repeating the same process. We can just build 
the equivalent of viewitems right off the bat, and not even think about 
changing plain slicing (because we never want another 3.0 break).

(Of course there may still be good arguments for why this isn’t the same, or 
for why it should end up differently even if it _is_ the same.)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TLTTZXWFP3QM6WRKEGF246RK6WYJSEG7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-14 Thread Andrew Barnert via Python-ideas

On May 14, 2020, at 03:35, Steven D'Aprano  wrote:
> 
> On Sun, May 10, 2020 at 09:36:14PM -0700, Andrew Barnert via Python-ideas 
> wrote:
> 
> 
>>> for i in itertools.seq_view(a_list)[::2]:
>>>...
>>> 
>>> I still think I prefer this though:
>>> 
>>> for i in a_list.view[::2]:
>>>...
> 
>> Agreed. A property on sequences would be best,
> 
> Why?

Because the whole point of this is for something to apply slicing syntax to. 
And compare:

lst.view[10:20]
view(lst)[10:20]
vjew(lst, 10, 20)

The last one is clearly the worst, because it doesn’t let you use slicing 
syntax.

The others are both OK, but the first seems the most readable. I’ll give more 
detailed reasons below. (There may be reasons why it can’t or shouldn’t be 
done, which is why I ranked all of the options in order rather than just 
insisting that we must have the first one or I hate his whole idea.)

> This leads to the same problem that len() solves by being a function, 
> not a method on list and tuple and str and bytes and dict and deque and 
>  Making views a method or property means that every sequence type 
> needs to implement it's own method, or inherit from the same base class, 

But len doesn’t solve that problem at all, and isn’t meant to. It just means 
that every sequence type has to implement __len__ instead of every sequence 
type having to implement len.

Protocols often provide some added functionality. iter() doesn’t just call 
__iter__, it can also fall back to old-style sequence methods, and it has the 
2-arg form. Similarly, str() falls back to __repr__, and has other parameter 
forms, and doubles as the constructor for the string type. And next() even 
changed from being a normal method to a protocol and function, breaking 
backward compatibility, specifically to make it easier to do the 2-arg form.

But len() isn’t like that. There is no fallback, no added behavior, nothing. It 
doesn’t add anything. So why do we have it? Guido’s argument is in the FAQ. It 
starts off with “For some operations, prefix notation just reads better than 
postfix”. He then backs up the general principle that this is sometimes true by 
appeal to math. And then he explains the reasons this is one of those 
operations by arguing that “len”’is the most important piece of information 
here so it belongs first.

It’s the same principle here, but the specific answer is different. View-ness 
is not more important than the sequence and the slicing, so it doesn’t call out 
to be fronted. In fact, view-ness is (at least in the user’s mind) strongly 
tied to the slicing, so it calls out to be near the slice.

And it’s not like this is some unprecedented thing. Most of the collection 
types, and corresponding ABCs, have regular methods as well as protocol 
dunders. Is anyone ever confused by having to write xs.index(x) instead of 
index(xs, x)? I don’t think so. In fact, I think the latter would be _more_ 
confusing, because “index” has so many different meanings that “list.index” is 
useful to nail it down. (Notice that we already _have_ a dunder named 
__index__, and it does something totally different…) And the same is true for 
“view”. In fact, everything in your argument is so generic that it acts as an 
argument against not just .index() but against any public methods or attributes 
on anything. Obviously you didn’t intend it that way, but once you actually 
target it so that it argues against .len() but not .index(), I don’t think 
there’s any argument against .view left.

> and that's why in the Java world nobody agrees what method to call to 
> get the length of an object.

Nobody can agree on what function to call in C or PHP even though they’re 
functions rather than methods in those languages.

Everyone can agree on what method to use in C++ and Smalltalk even though 
they’re methods in those languages, just like Java. (In fact, C++ even loosely 
enforces consistency the same way Python loosely does, except at compile time 
instead of run time—if your class doesn’t have a size() method, it doesn’t duck 
type as a collection and therefore can’t be used in templates that want a 
collection.)

Or just look at Python: nobody is confused about how to spell the .index method 
even though it’s a method.

So the problem in Java has nothing to do with methods. (We don’t have to get 
into what’s wrong with Java here; it’s not relevant.)

> So if we are to have a generic view proxy object, as opposed to the very 
> much non-generic dict views, then it ought to be a callable function 

We don’t actually _know_ how generic it can/should be yet. That’s something 
we’ve been discussing in this thread. It might well be a 
quality-of-implementation issue that has different best answers in different 
Pythons. Or it might not. It’s not obvious. Which implies that whatever the 
answer is, it’s not something that peopl

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-14 Thread Andrew Barnert via Python-ideas

On May 14, 2020, at 03:01, Steven D'Aprano  wrote:
> 
> On Mon, May 11, 2020 at 10:41:06AM -0700, Andrew Barnert via Python-ideas 
> wrote:
> 
>> I think in general people will expect that a slice view on a sequence 
>> acts like “some kind of sequence”, not like the same kind they’re 
>> viewing—again, they won’t be surprised if you can’t insert into a 
>> slice of a list.
> 
> o_O
> 
> For nearly 30 years, We've been able to insert into a slice of a list. 
> I'm going to be *really* surprise if that stops working

Which is exactly why Christopher said from the start of this thread, and 
everyone else has agreed at every step of the way, that we can’t change the 
default behavior of slicing, we have to instead add some new way to 
specifically ask for something different.

Well, not _jusr_ this. There’s also the fact that for 30 years people have been 
using [:] to mean copy, and the fact that for 30 years people have taken small 
slices of giant lists and then expected the giant lists to get collected, and 
so on. But any one of these is enough reason on its own that copy-slicing must 
remain the default, behavior you get from lst[10:20]. Not only that, but 
whatever gives you view-slicing must look sufficiently different that you 
notice the difference—and ideally that gives you something you can look up if 
you don’t know what it means. I think lst.view[10:20] fits that bill.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FSAZWPEV3LA3K2CP46GMLABDIOCM7FSL/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-13 Thread Andrew Barnert via Python-ideas

On May 13, 2020, at 20:49, Christopher Barker  wrote:
> 
> OK, now for:
> 
>> On Wed, May 13, 2020 at 7:50 PM Andrew Barnert  wrote:
> 
>> But that’s the wrong generalization. Because sets also work the same way, 
>> and they aren’t Sequences. Nor are dict views, or many of the other kinds of 
>> things-that-can-be-iterated-over-and-over-independently.
> 
> 
> But file.readline() does not return any of those objects. It returns a list. 
> If you see this as an opportunity to teach about the iteration protocol, then 
> sure, you'd want to make that distinction. But I think the file object is the 
> wrong first example -- it's an oddball, having both the iteration protocol, 
> AND methods for doing most of teh same things.

Agreed, it’s not an ideal first example, and zip or map would be much better. 
Unfortunately, files seem to be the example that many people run into first. 
(Or maybe lots of people do run into map first, but fewer of them get confused 
and need to go ask for help?) When you’re teaching a class, you can guide 
people to hit the things you want them to think about, but the intern, or C# 
guru who only touches Python once a year, or random person on StackOverflow 
that I’m dealing with apparently didn’t take your class. This is where they got 
confused, so this is what they ask about.

> Most iterables don't have the equivalent of readlines()  or readline() -- and 
> in this case, I think THAT's the main sorce of confusion, rather than the 
> iterable vs iterator distinction.

But notice that they’re already writing `for line in f:`. That means they *do* 
understand that files are iterables.

Sure, they probably don’t know the word “iterable”, but they understand that 
files are things you can use in a for loop (and that’s all “iterable” really 
means, unless you’re trying to implement rather than use them).

And honestly, if Python didn’t make iteration so central, I’m not sure as many 
novices would get that far that fast in the first place. Imagine if, instead of 
just calling open and then doing `for line in f:`, you had to call an opener 
factory to get a filesystem opener, call that to get a file object, bind a 
line-buffered read stream to it, then call a method on that read stream with a 
callback function that processes the line and makes the next async read call. 
Anyone who gets that far is probably already a lot more experienced with 
JavaScript than someone who iterates their first file is with Python.

> > > You can explain it anyway. In fact, you _have_ to give an explanation 
> > > with analogies and examples and so on, and that would be true even if 
> > > there were a word for what lists are. But it would be easier to explain 
> > > if there were such a word, and if you could link that word to something 
> > > in the glossary, and a chapter in the tutorial.
> 
> OK -- time for someone to come up with word for "Itererable that isn't an 
> Iterator" -- I"d start using it :-)

People used to loosely use “collection” for this, back before it was defined to 
mean “sized container”, but that no longer works.

Maybe we need to come up with a word that can’t possibly have any existing 
meaning to anyone, like Standard Oil did with “Exxon”.___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UUQJSBVB6PXPIV5QID76SOHOOUGF6FN2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-13 Thread Andrew Barnert via Python-ideas

> On May 13, 2020, at 20:32, Christopher Barker  wrote:
> 
>>> On Wed, May 13, 2020 at 7:50 PM Andrew Barnert  wrote:
>> On May 13, 2020, at 12:40, Christopher Barker  wrote:
>> 
 Back to the Sequence View idea, I need to write this up properly, but I'm 
 thinking something like:
>>> 
>>> Can we just say that it returns an immutable sequence that blah blah, 
>>> without defining or naming the type of that sequence?
> 
> Sure -- but it ends up getting a lot more wordy if you dont' have a name for 
> a thing. 

You’re right. Looking at the dict and similar docs, what they mostly do is to 
talk about”the key view”, and sometimes even “the key view type”, etc., in 
plain English, while being careful not to say anything that implies it has any 
particular name or identity. (In particular, “key view type” obviously can’t be 
the name of an actual type, because it has a space in it.)

Anyway, if the proposal gets far enough to need docstrings and documentation, I 
guess you can worry about getting it right then, but until then you don’t have 
to be that careful; as long as we all know that list_view isn’t meant to name a 
specific type (and to be guaranteed distinct from tuple_view), I think we’ll 
all be fine.

>> Python doesn’t define the types of most things you never construct directly.
> 
> No, but there are ABCs so that might e the way to talk about this.

That’s a good point. Does a sequence slice view (or a more general sequence 
view?) need an ABC beyond just being a Sequence?

I wasn’t expecting that to be needed, but now that you bring it up… if there’s, 
say, a public attribute/property or method to get the underlying object, 
presumably it should be the same name on all such views, and maybe that’s 
something you’d want to be documented, and maybe even testable, by an ABC after 
all.

>> And nobody even notices that list and tuple use the same type for their 
>> __iter__ in some Python implementations but not others.
> 
> I sure haven't noticed that :-)

It’s actually a bit surprising what tuple and list share under the covers in 
CPython, even at the public C API level.

>> > calling.view on a list_view is another trick -- does it reference the host 
>> > view? or go straight back to the original sequence?
> 
> > I think it’s the same answer again. In fact, I think .view on any slice 
> > view should just return self.
> 
> Hmm -- this makes me nervous, but as long as its immutable, why not?

Exactly. The same as these:

>>> s = ''.join(random.choices(string.ascii_lowercase, k=10))
>>> s[:] is s
True
>>> str.__new__(s) is s
True
>>> copy.copy(s) is s
True
>>> t = tuple(s)
>>> t[:] is t
True

etc.

But all of those are just allowed, and implemented that way by CPython, not 
guaranteed by the language. So maybe the same should be true here. You can 
implement .view as just self, but if other implementations want to do something 
different they can, as long as it meets the same documented behavior (which 
could just be something like “view-slicing a view slice has the same effect as 
slicing a view slice” or something?).

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Q4PVK6ZBH4MCJMIHYEYA5MHYFHTPCUMN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-13 Thread Andrew Barnert via Python-ideas

On May 13, 2020, at 12:40, Christopher Barker  wrote:

I hope you don’t mind, but I’m going to take your reply out of order to get the 
most important stuff first, in case anyone else is still reading. :)

>> Back to the Sequence View idea, I need to write this up properly, but I'm 
>> thinking something like:
> 
> (using a concrete example or list)
> 
> list.view is a read-only property that returns an indexable object.
> indexing that object with a slice returns a list_view object
> 
> a_view = list.view[a:b:c]
> 
> a_view is a list_ view object
> 
> a list_view object is a immutable sequence. indexing it returns elements from 
> the original list.

Can we just say that it returns an immutable sequence that blah blah, without 
defining or naming the type of that sequence?

Python doesn’t define the types of most things you never construct directly. 
(Sometimes there is a public name for it buried away in the types module, but 
it’s not mentioned anywhere else.) Even the dict view objects, which need a 
whole docs section to describe them, never say what type they are.

And I think this is intentional. For example, nowhere does it say what type 
function.__get__ returns, only what behavior that object has—and that allowed 
Python 3 to get rid of unbound methods, because a function already has the 
right behavior. And nobody even notices that list and tuple use the same type 
for their __iter__ in some Python implementations but not others. Similarly, I 
think dict.__iter__() used to return a different type from 
dict.keys().__iter__() in CPython but now they share a type, and that didn’t 
break any backward compatibility guarantees.

And it seems there’s no reason you couldn’t use the same generic sequence view 
type on all sequences, but also it’s possible that a custom one for list and 
tuple might allow some optimization (and even more likely so for range, 
although it may be less important). So if you don’t specify the type, that can 
be left up to each version of each implementation to decide.

> slicing a list view returns  I'm not sure what here -- it should probably 
> be a copy, so a new list_view object refgerenceing the same list? That will 
> need to be thought out carefully)

Good question. I suppose there are three choices: (1) a list (or, in general, 
whatever the original object returns from slicing), (2) a new view of the same 
list, or (3) a view of the view of the list.

I think I agree with you here that (2) is the best option. In other words, 
lst.view[2::2][1::3] gives you the exact same thing as lst.view[4::6].

At first that sounds weird because if you can inspect the attributes of the 
view object, there’s way to see that you did a [1::3] anywhere.

But that’s exactly the same thing that happens with, e.g,, 
range(100)[2::2][1::3]. You just get range(4, 100, 6), and there’s no way to 
see that you did a [1::3] anywhere.

And the same is true for memoryview, and for numpy arrays and bintrees tree 
slices—despite them being radically different things in lots of other ways, 
they all made the same choice here. And even beyond Python, it’s what slicing a 
slice view does in Swift (even though other kinds of views of views don’t 
“flatten out” like this, slice views of slice views do), and in Go. (Although 
C++20 is a counterexample here.)

> calling.view on a list_view is another trick -- does it reference the host 
> view? or go straight back to the original sequence?

I think it’s the same answer again. In fact, I think .view on any slice view 
should just return self.

Think about it: whether you decided that lst.view[2::2][1::3] gives 
lst.view[4::6] or a nested view-of-a-view-of-a-list, it would be confusing if 
lst.view[2::2].view[1::3] gave you the other one, and what other options would 
make sense? And, unless there’s some other behavior besides slicing on view 
properties, if self.view slices the same as self, it might as well just be self.

> iter(a_list_view) returns a list_viewiterator.

Here, it seems even more useful to leave the type unspecified. For list (and 
tuple) in CPython, I’m not sure if you can get away with using the special 
list_iterator type used by list and tuple (which accesses the underlying array 
directly), or, if not that, the PySeqIter type used for old-style 
iter-by-indexing, but if you can, it would be both simpler and more efficient. 
And similarly, range.view might be able to use the range_iterator type. Or, if 
you can’t do that, a generic PyIter around tp_next would be less efficient than 
a custom type, but again simpler, and the efficiency might not matter. Or, if 
you just had a single sequence view type rather than custom ones for each 
sequence type, that would obviously mean a single iterator type. And so on. 
That all seems like quality-of-implementation stuff that should be left open to 
whatever turns out to be best.

> iterating that gets you items from the "host" "on the fly.
> 
> All this is a fair bit more complicated than

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-13 Thread Andrew Barnert via Python-ideas

On May 13, 2020, at 05:31, Richard Damon  wrote:
> 
> On 5/13/20 2:22 AM, Stephen J. Turnbull wrote:
>> MRAB writes:
>>> 
>>> This isn't a parsing problem as such.  I am not an expert on the
>>> parser, but what's going is something like this: the parser
>>> (tokenizer) sees the character "=" and expects an operator.  Next, it
>>> sees something that is not "=" and not whitespace, so it expects a
>>> literal or an identifier.  " “" is not parsable as the start of a
>>> literal, so the parser consumes up to the next boundary character
>>> (whitespace or operator).  Now it checks for the different types of
>>> barewords: keywords and identifiers, and neither one works.
>>> 
>>> Here's the critical point: identifier fails because the tokenizer
>>> tries to match a sequence of Unicode word constitituents, and " “"
>>> isn't one.  So it fails the sequence of non-whitespace characters, and
>>> points to the end of the last thing it saw.
>> But that is the problem, identifier fails too late, it should have seen
>> at the start that the first character wasn't valid in an identifier, and
>> failed THERE, pointing at the bad character. There shouldn't be a
>> post-hoc test for bad characters in the identifier, it should be a
>> pre-test in the tokenizer.
>> 
>> So I see no reason why we need to transition to the new parser to fix
>> this.  (And the new parser (as of the last comment I saw from Guido)
>> probably doesn't help: he kept the tokenizer.)  We just need to make a
>> second pass over the invalid identifier and identify the invalid
>> characters it contains and their positions.
> There is no need to rescan/reparse, the tokenizer shouldn't treat
> illegal characters as possibly part of a token.

Isn’t this what already happens?

>>> import tokenize, io
>>> def tok(s): return 
list(tokenize.tokenize(io.BytesIO(x.encode()).readline))
>>> tok('spam(“Abc”)')

When I run this in 3.7, the fourth token is an ERRORTOKEN with string ”, then 
there’s a NAME with Abc, then another ERRORTOKEN with “.

And reading the Lexical Analysis chapter of the docs, this seems correct. The 
smart quote is not a possible xid_start, or any other start of any token 
terminal, so it should immediately fail as an error.(The fact that the 
tokenizer eats it, generates an ERRORTOKEN, and then lexes the Abc as a NAME, 
rather than throwing an exception or otherwise punting, is a pretty nice 
error-recovery attempt, and seems perfectly reasonable.)

Is that not true for the internal C tokenizer? Or is it true, but the parser or 
the error generating code isn’t taking advantage of it?

(By the way. I’m pretty sure this behavior isn’t specific to 3.7, but has been 
that way back into the mists of whenever you could first write old-style import 
hooks, even up to the way error recovery works. I’ve taken advantage of this 
behavior in experimenting with new syntax. If your new syntax is not just 
unambiguous at the parser level, but even at the lexical level, you can just 
scan the token stream for your matching ERRORTOKEN.)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OZ5N3NJIGQCO7Q645IDX4IWA45GAMEI6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-13 Thread Andrew Barnert via Python-ideas

On May 12, 2020, at 23:29, Stephen J. Turnbull 
 wrote:
> 
> Andrew Barnert writes:
>>> On May 10, 2020, at 22:36, Stephen J. Turnbull 
>>>  wrote:
>>> 
>>> Andrew Barnert via Python-ideas writes:
>>> 
>>>> A lot of people get this confused. I think the problem is that we
>>>> don’t have a word for “iterable that’s not an iterator”,
>>> 
>>> I think part of the problem is that people rarely see explicit
>>> iterator objects in the wild.  Most of the time we encounter iterator
>>> objects only implicitly.
>> 
>> We encounter iterators in the wild all the time, we just don’t
>> usually _care_ that they’re iterators instead of “some kind of
>> iterable”, and I think that’s the key distinction you’re looking
>> for.
> 
> It *is* the distinction I'm making with the word "explicit".  I never
> use "next" on an open file.  I'm not sure your more precise statement
> is better.
> 
> I think the real difference is that I'm thinking of "people" as
> including my students who have no clue what an iterator does and don't
> care what an iterable is, they just cargo cult
> 
>with open("file") as f:
>for line in f:
>do_stuff(line)
> 
> while as you point out (and I think is appropriate in this discussion)
> some people who are discussing proposed changes are using the available
> terminology incorrectly, and that's not good.

Students often want to know why this doesn’t work:

with open("file") as f:
for line in file:
do_stuff(line)
for line in file:
do_other_stuff(line)

… when this works fine:

with open("file") as f:
lines = file.readlines()
for line in lines:
do_stuff(line)
for line in lines:
do_other_stuff(line)

This question (or a variation on it) gets asked by novices every few day’s on 
StackOverflow; it’s one of the top common duplicates.

The answer is that files are iterators, while lists are… well, there is no 
word. You can explain it anyway. In fact, you _have_ to give an explanation 
with analogies and examples and so on, and that would be true even if there 
were a word for what lists are. But it would be easier to explain if there were 
such a word, and if you could link that word to something in the glossary, and 
a chapter in the tutorial.

>> Still, having clear names with simple definitions would help that
>> problem without watering down the benefits.
> 
> I disagree.  I agree there's "amortized zero" cost to the crowd who
> would use those names fairly frequently in design discussions, but
> there is a cost to the "lazy in the technical sense" programmer, who
> might want to read the documentation if it gave "simple answers to
> simple questions",
> but not if they have to wade through a thicket of
> "twisty subtle definitions all alike" to get to the simple answer, and
> especially not if it's not obvious after all that what the answer is.

We shouldn’t define everything up front, just the most important things. But 
this is one of the most important things. People need to understand this 
distinction very early on to use Python, and many of them don’t get it, hence 
all the StackOverflow duplicated. People run into this problem well before they 
run into a problem that requires them to understand the distinction between 
arguments and parameters, or protocols and ABCs, or Mapping and dict.

> It also makes conversations with experts fraught, as those experts
> will tend to provide more detail and precision than the questioner
> wants (speaking for myself, anyway!)  "Not every one-sentence
> explanation needs terminology in the documentation."

I think it’s the opposite. 

I can teach a child why a glass will break permanently when you hit it while a 
lake won’t by using the words “solid” and “liquid”. I don’t have to give them 
the scientific definitions and all the equations. I might not even know them. 
And in the same way, I can teach novices why the x after x=y+1 doesn’t change 
when y changes by teaching them about variables without having to explain 
__getattr__ and fast locals and the import system and so on.

Knowing all the subtleties or shear force or __getattribute__ or whatever 
doesn’t prevent me from teaching a kid without getting into those subtleties. 
The better I understand “solid” or “variable”, the easier it is for me to teach 
it. That’s how words work, or how the human mind works, or whatever, and that’s 
why language is useful for teaching.

>>>> But that last thing is exactly the behavior you expect from “things
>>>> like list, dict, etc.”, and it’s hard to explain, and therefore
>>>> ha

[Python-ideas] Re: Sanitize filename (path part)

2020-05-12 Thread Andrew Barnert via Python-ideas

On May 12, 2020, at 01:32, Barry Scott  wrote:
> 
> 
>> On 11 May 2020, at 23:24, Andrew Barnert  wrote:
>> 
>>> On May 11, 2020, at 13:31, Barry Scott  wrote:
>>> 
>>> macOS and Unix version (I only use Unicode input so avoid the random bytes 
>>> problems):
>> 
>> But that doesn’t avoid the problem. If someone gives you a character whose 
>> encoding on the target filesystem includes a null or pathsep byte, your 
>> sanitizer will pass it as safe, when it shouldn’t.
> 
> Do you have a example that shows an encoding that produces a NUL or pathsep? 
> I'm not aware of any.

UTF-1 encodes U+D7FF to the bytes F7 2F C3. BOCU has similar examples. In the 
other direction, MUTF-8 decodes the bytes CO 80 to U+. There were a number 
of cross-site scripting and misleading-link attacks abusing (mostly) BOCU in 
this way, which is part of the reason WHATWG banned them as charsets. Although 
there were other reasons (they banned stuff like SCSU and CESU-8 and UTF-7 at 
the same time, and I don’t think any of them have the same problem). And if 
there were widespread legitimate uses of these codecs, they probably wouldn’t 
have been banned (see UTF-16LE, which is even easier to exploit this way, but 
unfortunately way too common).

I don’t think Python comes with codecs for any of these encodings. And I don’t 
know of anyone who ever used them for filenames. (SCSU was the default fs 
encoding on Symbian flash memory drives, but again, I don’t think it has this 
problem.) So this may well not be a practical problem.

>> Is it still a realistic problem today? I don’t know. I’m pretty sure the 
>> modern versions of Shift-JIS, EUC-*, Big5, and GB can never have 
>> continuation bytes below 0x30, but even if I’m right, are these (and UTF-8, 
>> of course) the only multi-byte encodings anyone ever uses on Unix 
>> filesystems?
> 
> I suspect that legacy encoding are used in organisations with old data, but 
> do have direct experience of this.

I have direct experience of some of those East Asian codecs, albeit 15 or so 
years ago. I’m pretty sure the only ones they used were all safe.

I also have experience even further back of mounting drives from Ataris and 
classic Macs and IBM mainframes and all kinds of other crazy things under Unix, 
but the filesystem drivers recoded filenames on the fly, along with providing a 
Unix-style hierarchical filesystem, so user-level code didn’t have to worry 
about MacKorean or EBCDIC or whatever any more than it had to worry about : as 
a pathsep and absolute paths being the ones that _don’t_ start with a pathsep 
and so on.

So, based on my experience, it doesn’t seem likely to come up even in shops 
full of old data. But that experience isn’t worth much…

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7L466KEUYZ3ZA2IUBUD2L7UONQFPSECM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sanitize filename (path part)

2020-05-11 Thread Andrew Barnert via Python-ideas

On May 11, 2020, at 13:31, Barry Scott  wrote:
> 
> macOS and Unix version (I only use Unicode input so avoid the random bytes 
> problems):

But that doesn’t avoid the problem. If someone gives you a character whose 
encoding on the target filesystem includes a null or pathsep byte, your 
sanitizer will pass it as safe, when it shouldn’t.

This isn’t possible on macOS because the OS won’t let you mount any filesystem 
whose encoding isn’t UTF-8, but it is possible on most other *nixes, and it has 
been used as an attack in the past.

Is it still a realistic problem today? I don’t know. I’m pretty sure the modern 
versions of Shift-JIS, EUC-*, Big5, and GB can never have continuation bytes 
below 0x30, but even if I’m right, are these (and UTF-8, of course) the only 
multi-byte encodings anyone ever uses on Unix filesystems?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KPEEFJHXFH26EMLYRPAG27MQD2LJHCHG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas

> On May 11, 2020, at 14:18, Steve Jorgensen  wrote:
> 
> Andrew Barnert wrote:
>>> On May 11, 2020, at 00:40, Steve Jorgensen ste...@stevej.name wrote:
>>> Proposal:
>>> Add a new function (possibly os.path.sanitizepart) to sanitize a value for
>>> use as a single component of a path. In the default case, the value must 
>>> also not be a
>>> reference to the current or parent directory ("." or "..") and must not 
>>> contain control
>>> characters.
> 
>> If not: the result can contain the path separator, illegal characters that 
>> aren’t
>> control characters, nonprinting characters that aren’t control characters, 
>> and characters
>> whose bytes (in the filesystem’s encoding) are ASCII control characters?
>> And it can be a reserved name, or even something like C:; as long as it’s 
>> not the Unix
>> . or ..?
> 
> Are there non-printing characters outside of those in the Unicode general 
> category of "C" that make sense to omit?

Off the top of my head, everything in the Z category (like U+2029 PARAGRAPH 
SEPARATOR) is non-printable, and makes sense to sanitize.

Meanwhile, what about invalid characters being smuggled through str by 
surrogate_escape? I don’t know if those are printable, or what category they 
are… or whether you want to sanitize them, for that matter, so I have no idea 
if this rule does the right thing or not.

More generally, we shouldn’t be relying on what respondents know off the top of 
their heads in the first place for something that people are going to rely on 
for security/safety purposes.

> Regarding names like "C:", you are absolutely right to point that out. When 
> the platform is Windows, certainly, ":" should not be allowed, and 
> perhaps colon should not be allowed at all. I'll need to research that a bit. 
> This matters because if the path part is used without explicit "./" prefixed 
> to it, then it will refer to a root path,

The name `C:spam` means spam in the current directory for the C drive—which 
isn’t the same as the current working directory unless C is the current working 
drive, but it’s definitely not (in general) the same as the root.

And what about all the other questions I asked?

Most importantly, you need to clarify what the use case is, and why this 
proposal meets it. Otherwise, it sounds more like a trap to make people think 
their code is safe when it isn’t, not a fix for the real problem.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XRFT77TXLE7MNAP2MV2IC57NG4EWQIGP/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas

On May 11, 2020, at 12:54, Wes Turner  wrote:
> 
> 
> What does sanitizepart do with newlines \n \r \r\n in filenames? Are these 
> control characters?

>>> unicodedata.category('\n')
Cc

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/D6HRO6UIEXK56KV6NMR676CJCZKMKZJV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas

On May 11, 2020, at 12:59, Barry Scott  wrote:
> 
> 
>> On 11 May 2020, at 18:09, Andrew Barnert via Python-ideas 
>>  wrote:
>> 
>> More generally, what’s the use case for %-encoding filenames like this? Are 
>> people expecting it to interact transparently with URLs, so if I save a file 
>> “spam\0eggs” in a Python script and then try to browse to 
>> file:///spam\0eggs” in a browser, the browser will convert the \0 character 
>> to %00 the same way my Python script did and therefore find the file? 
> 
> No.
> 
> The \0 can never be part of a valid file in Unix, macOS or Windows.

Of course. Which is exactly the kind of thing this sanitize function is meant 
for.

Hence my question: if my Python script is sanitizing all filenames with this 
function with escape='%', is the expectation that it’ll actually give me 
something that can be used if I paste the same thing into a browser and let it 
url-escape a file URL? If so, will that actually work? If not, what _is_ the 
intended use for this option?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CMYTQZQHSALH4ZREIMTDMFLYMJXWSPG3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Andrew Barnert via Python-ideas

On May 11, 2020, at 10:57, Alex Hall  wrote:
> 
> 
>> On Mon, May 11, 2020 at 12:50 AM Christopher Barker  
>> wrote:
> 
>  
>> Though it is heading in a different direction that where Andrew was 
>> proposing, that this would be about making and using views on sequences, 
>> which really wouldn't make sense for any iterator.
> 
> The idea is that islice would be the default behaviour and classes could 
> override that to return views if they want.

It is possible to get both, but I don’t think it’s easy.

I think the ultimate unification of these ideas is the “views everywhere” 
design of Swift. Whether you have a sequence or just a collection or just a 
one-shot forward-only iterable, you use the same syntax and the same functions 
to do everything—copy-slicing, view-slicing, chaining, mapping, zipping, etc. 
And the result is always a view with as much functionality as makes sense (do 
filtering a sequence gives you a view that’s a reversible collection, not a 
sequence). So you can view-slice the result of a genexpr the same way you would 
a list, and you just get a forward-only iterable view instead of a full-fledged 
sequence view. I’ve started designing such a thing multiple times, every couple 
years or so, and always realize it’s even more work than I thought and harder 
to fit into Python than i thought and give up.

But maybe doing it _just_ for view slicing, rather than for everything, and 
requiring a wrapper object to use it, is a lot simpler, and useful enough on 
its own.

And that would fit well into the Python way of growing by adding stuff as 
needed, and only trying to come up with a complete and perfect general design 
up front when absolutely necessary.___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4CQE7Q4TRJTQF66ZHMCPJMCLCUEXHEAT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 22:36, Stephen J. Turnbull 
 wrote:
> 
> Andrew Barnert via Python-ideas writes:
> 
>> A lot of people get this confused. I think the problem is that we
>> don’t have a word for “iterable that’s not an iterator”,
> 
> I think part of the problem is that people rarely see explicit
> iterator objects in the wild.  Most of the time we encounter iterator
> objects only implicitly.

We encounter iterators in the wild all the time, we just don’t usually _care_ 
that they’re iterators instead of “some kind of iterable”, and I think that’s 
the key distinction you’re looking for.

Certainly when you open a file, you usually deal with the file object. And 
whenever you feed the result of one genexpr into another, or into a map call, 
you are using an iterator. You often even store those iterators in variables.

But if you change that first genexpr to a listcomp (say, because you want to be 
able to breakpoint there and print it to the debugger, or dump it to a log), 
nothing changes except performance. And people know this and take advantage of 
it without even thinking. And that’s true of the majority of places you use 
iterators. Code that explicitly needs an iterator (like the grouper idiom where 
you zip an iterator with itself) certainly does exist, but it’s nowhere near as 
common as code that can use any iterable and only uses an iterator because 
that’s the easiest thing to write or the most efficient thing.

This is a big part of what I meant about the concepts being so nice that people 
manage to use them despite not being able to talk about them.

> Nomenclature *is* a problem (I still don't
> know what a "generator" is: a function that contains "yield" in its
> def, or the result of invoking such a function), but part of the
> reason for that is that Python successfully hides objects like
> iterators and generator objects much of the time (I use generator
> expressions a lot, "yield" rarely).

You’re right. The fact that the concept (and the implementation of those 
concepts) is so nice that we rarely have to think about these things explicit 
is actually part of the reason it’s hard to do so on the rare occasions we need 
to. And put that way, it’s a pretty good tradeoff.

Still, having clear names with simple definitions would help that problem 
without watering down the benefits.

>> or for the refinement “iterable that’s not an iterator and is
>> reusable”, much less the further refinement “iterable that’s
>> reusable, providing a distinct iterator that starts from the head
>> each time, and allows multiple such iterators in parallel”.
> 
> Aside: Does "multiple parallel iterators" add anything to "distinct
> iterator that starts from the head each time"?  Or did you mean what I
> would express as "and *so* it allows multiple parallel iterators"?

I’m being redundant here to make sure I’m understood, because just saying it 
the second way apparently didn’t get the idea across the first time. 

>> But that last thing is exactly the behavior you expect from “things
>> like list, dict, etc.”, and it’s hard to explain, and therefore
>> hard to document.
> 
> Um, you just did *explain* it, quite well IMHO, you just didn't *name*
> it. ;-)

Well, it was a long, and redundant, explanation, not something you’d want to 
see in the docs or even a PEP.

>> The closest word for that is “collection”, but Collection is also a
>> protocol that adds being a Container and being Sized on top of
>> being Iterable, so it’s misleading unless you’re really careful. So
>> the docs don’t clearly tell people that range, dict_keys, etc. are
>> exactly that “like list, dict, etc.” thing, so people are confused
>> about what they are. People know they’re lazy, they know iterators
>> are lazy,
> 
> I'm not sure what "lazy" means here.  range is lazy: the index it
> reports doesn't exist anywhere in the program's data until it computes
> it.  But I wouldn't call a dict view "lazy" any more than I'd call the
> underlying dict "lazy".  Views are references, or alternative access
> interfaces if you like.  But the data for the view already exists.

“lazy” as in it creates something that acts like a list or a set, but hasn’t 
actually stored a list or set or other data structure in memory or done a bunch 
of up-front CPU work. You’re right that a more precise definition would 
probably include range but not dict_keys, but I think people do use it in a way 
that includes both, and that’s part of the reason they’re equally confused into 
thinking both are iterators.

>> so they think they’re a kind of iterator, and the docs don’t ever
>> make it clear why that’s wrong.
> 
> I don't think the problem is in the docs.  Iterators and vie

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 21:51, Christopher Barker  wrote:
> 
> 
> On Sun, May 10, 2020 at 9:36 PM Andrew Barnert  wrote:
> 
>> However, there is one potential problem with the property I hadn’t thought 
>> of until just now: I think people will understand that mylist.view[2:] is 
>> not mutable, but will they understand that mystr.view[2:] is not a string? 
>> I’m pretty sure that isn’t a problem for seqview(mystr)[2:], but I’m not 
>> sure about mystr.view[2:].
> 
> One more issue around the whole "a string is sequence of strings" thing :-) 
> Of course, it *could* be a string -- not much difference with immutables.
> Though I suppose if you took a large slice of a large string, you probably 
> don't want the copy. But what *would* you want to do with it.

That “string is a sequence of strings” issue, plus the “nothing can duck type 
as a string“ issue.

Here’s an example that I can write in, say, Swift or Rust or even C++, but not 
in Python: I mmap a giant mailbox file, and I can treat that as a string 
without copying it anywhere. I split it into a string for each message—I don’t 
want to copy them all into a list of strings, and ideally I don’t even want to 
copy one at a time into an iterator or strings because some of them can be 
pretty huge; I want a list or iterator of views into substrings of the mmap. 
(This isn’t actually a great example, because even with substring views, the 
mmap can’t be used as a str in the first place, but it has the virtue of being 
a real example of code I’ve actually written.)

> but if you had a view of a slice, and it was a proper view, it might be 
> pretty poky for many string operations, so probably just as well not to have 
> them.

I think in general people will expect that a slice view on a sequence acts like 
“some kind of sequence”, not like the same kind they’re viewing—again, they 
won’t be surprised if you can’t insert into a slice of a list. It’s only with 
str that I’m worried they might expect more than we can provide, which sucks 
because str is the one place we _couldn’t_ provide it even if we wanted to.

But maybe I’m wrong and people won’t have this assumption, or will be easily 
cured of it.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CMHJKKVH2TQLED2W3KICEIQY43SBX27S/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas

On May 11, 2020, at 00:40, Steve Jorgensen  wrote:
> 
> Proposal:
> 
> Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for 
> use as a single component of a path. In the default case, the value must also 
> not be a reference to the current or parent directory ("." or "..") and must 
> not contain control characters.

“Also” in addition to what? Are there other requirements enforced besides these 
two that aren’t specified anywhere?

If not: the result can contain the path separator, illegal characters that 
aren’t control characters, nonprinting characters that aren’t control 
characters, and characters whose bytes (in the filesystem’s encoding) are ASCII 
control characters?

And it can be a reserved name, or even something like C:; as long as it’s not 
the Unix . or ..?

What’s the use case where you need to sanitize these things but nothing else? 
As I said on the previous proposal, I have had a variety of times where I 
needed to sanitize filenames, but I don’t think this would have been what I 
wanted for _any_ of them, much less for most.

Are there existing tools, libraries, recommendations, etc. that this is based 
on, or is it just an educated guess at what’s important? For something that’s 
meant to go into the stdlib with a name that strongly implies  “if you use 
this, you’re safe from stupid or malicious filenames”, it would be misleading, 
and possibly dangerous, if it didn’t actually make you safe because it didn’t 
catch common mistakes/exploits that everyone else considers important to catch. 
And without any cites to what people everyone else considers important, why 
should anyone trust that this proposal isn’t missing, or getting wrong, 
anything critical?

Why isn’t this also available in pathlib? Is it the kind of thing you don’t 
envision high-level pathlib-style code ever needing to do, only low-level 
os-style code?

> When `replace` is supplied, it is used as a replacement for any invalid 
> characters or for the first character of an invalid name. When `prefix` is 
> not also supplied, this is also used as the replacement for the first 
> character of the name if it is invalid, not simply due to containing invalid 
> characters.

What’s the use case for separate prefix and replace? Or just for prefix in the 
first place?

> When `escape` is supplied (typically "%") it is used as the escape character 
> in the same way that "%" is used in URL encoding.

Why allow other escape strings? Has anyone ever wanted URL-encoding but with 
some other string in place or %, in this or any other context?

The escape character is not itself escaped?

More generally, what’s the use case for %-encoding filenames like this? Are 
people expecting it to interact transparently with URLs, so if I save a file 
“spam\0eggs” in a Python script and then try to browse to file:///spam\0eggs” 
in a browser, the browser will convert the \0 character to %00 the same way my 
Python script did and therefore find the file? If so, doesn’t it need to escape 
all the same characters that URLs do, not a different set? If not, isn’t using 
something similar to URL-encoding but not identical just going to confuse 
people rather than help then?

What happens if you supply a string longer than one character as escape? Or 
replace or prefix, for that matter?

Overall, it seems like there is a problem to be solved, but I don’t see any 
reason to be confident that this is the solution for anyone, and if it’s not 
the solution for _most_ people, adding it to the stdlib will just mean people 
don’t search for and find the right one, all the while misleading themselves 
into thinking they’re safe when they’re not, which will make the overall 
problem worse, not better.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZBBMQ34OHSR3RYKVUFLNUIM34WG3R2N7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 15:39, Christopher Barker  wrote:
> 
> 
>> On Sun, May 10, 2020 at 12:48 PM Andrew Barnert  wrote:
> 
>> Is there any way you can fix the reply quoting on your mail client, or 
>> manually work around it?
> 
> I'm trying -- sorry I've missed a few. It seems more and more "modern" email 
> clients make "interspersed" posting really hard. But I hate bottom posting 
> maybe even more than top posting :-( (gmail seems to have magically gotten 
> worse in this regard recently)

It seems like the one place Google still sees (the remnants of) Yahoo as a 
competitor is who can screw up mailing lists worse.

> It's also interesting to note (from another part of this thread) that slicing 
> isn't part of the Sequence ABC, or any? "official" protocol?

If we still had separate __getitem__ and __getslice__ when ABCs and the idea of 
being clearer about protocols had come along, I’ll bet __getslice__ would have 
been made part of the protocol. But I suppose it’s a little too late for me to 
complain about a change that I think went in even before new-style classes. :)

> I do see this, though not entirely sure what to make of it:
> 
> https://docs.python.org/3/c-api/sequence.html?highlight=sequence

Yeah, the fact that sequences and mappings have identical methods means that 
from Python those two protocols are opt-in rather than automatic, while from C 
you have to be more prepared for errors after checking than with other 
protocols. Annoying, but not using the same syntax and dunders for indexing and 
keying would be a lot more annoying.

> > Also, notice that this is true for all of the existing views, and none of 
> > them try to be un-featureful to avoid it.
> 
> But there is no full featured mapping-view that otherwise acts much like a 
> mapping.

types.MappingProxyType. In most cases, type(self).__dict__ will get you one of 
these.

But of course this is a view of the whole dict, not a subset.

> in theory, there *could* be -- if there was some nice way to specify a subset 
> of a mapping without copying the whole thing -- I can't think of one at the 
> moment.

Not in the stdlib, but for a SortedDict type, key-slicing makes total sense, 
and many of them do it—although coming up with a nice API is hard enough that 
they all seem to do it differently. (Obviously d[lo:hi] should be some iterable 
of the values from the keys lo<=key> I think the biggest question is actually the API. Making this a function (or 
>> a class that most people think of as a function, like most of itertools) is 
>> easy, but as soon as you say it should be a method or property of sequences, 
>> that’s trickier. You can add it to all the builtin sequence types, but 
>> should other sequences in the stdlib have it? Should Sequence provide it as 
>> a mixin? Should it be part of the sequence protocol, and therefore checked 
>> by Sequence as an ABC (even though that could be a breaking change)?
> 
> Here is where I think you (Andrew) and I (Chris B.) differ in our goals. My 
> goal here is to have an easily accessible way to use the slice syntax to get 
> an iterable that does not make a copy.

It’s just a small difference in emphasis. I want a way to get a non-copying 
slice, and I’d really like it to be easily accessible—I‘d grumble if you didn’t 
make it a member, but I’d still use it.

> While we're at it, getting a sequence view that can provide an iterator, and 
> all sorts of other nifty features, is great. But making it a callable in 
> itertools (or any other module) wouldn't accomplish that goal.
> 
> Hmm, but maybe not that bad: 
> 
> for i in itertools.seq_view(a_list)[::2]:
> ...
> 
> I still think I prefer this though:
> 
> for i in a_list.view[::2]:
> ...

Agreed. A property on sequences would be best, a wrapper object that takes 
slice syntax clearly back in second, and a callable that takes only islice 
syntax a very distant third. So if the first one is possible, I’m all for it.

My slices repo provides the islice API just because it’s easier for slapping 
together a proof of concept of the slicing part, definitely not because I’d 
want that added to the stdlib as-is.

However, there is one potential problem with the property I hadn’t thought of 
until just now: I think people will understand that mylist.view[2:] is not 
mutable, but will they understand that mystr.view[2:] is not a string? I’m 
pretty sure that isn’t a problem for seqview(mystr)[2:], but I’m not sure about 
mystr.view[2:].

> So to all those questions: I say "yes" except maybe: 
> 
> "checked by Sequence as an ABC (even though that could be a breaking change)" 
> -- because, well, breaking changes are "Not good".
> 
> I wonder if there is a way to make something standard, but not quite break 
> things -- hmm.
> 
> For instance: It seems to be possible to have Sequence provide it as a mixin, 
> but not have it checked by Sequence as an ABC?

Actually, now that I think about it, Sequence _never_ checks methods. Most of 
the

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 14:33, Christopher Barker  wrote:
> 
> Having a "tabnanny-like" function / module in the stdlib would be nice, 
> though I'd think a stand alone module in PyPi would be almost as good, and a 
> good way to see if it gains traction.

Good point.

Plus, it might well turn out that, say, the right thing for most Windows users 
and the right thing for most iOS Pythonista users is sufficiently different 
that two separate defancier packages are better than a one-size-fits-all could 
be, which we’d find out a lot more easily if people go out and use it in the 
field than if we try to design it here.

> BTW -- there are a whole lot of Syntax Errors that a semi smart algorithm 
> could provide meaningful suggestions about about. I'm pretty sure that's come 
> up before on this list, but maybe  "helpful" mode you could run Python in 
> that would do that for all Syntax errors that it could. We could even have a 
> way for folks to extend it with additional checks.

This already exists on PyPI. Actually, there are a few different ones.

One of them (I think friendly-tracebacks?) is very detailed. One of the authors 
sometimes posts about it here, when we’re talking about how some exception 
should be improved, with an example showing that they’ve already thought of it 
and done something better than is being proposed in the list.:)

That one may already be a category killer. I looked over some of the others and 
the only thing that jumped out at me was that one of them (better-errors?) 
integrates really nicely into iPython and Jupyter (using iPython’s syntax 
coloring settings, making more use of line-drawing and box characters, etc.)

But that doesn’t mean the category killer should be in the stdlib; I suspect 
they’re still improving it at a much faster pace than the stdlib could handle. 
But maybe the docs should link to it. The only problem is that the obvious 
places (like Interface Options section in the Usage docs) are things almost 
nobody reads…

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WWPXZFCCFH7UF3Q52EONRASPQOQQD2OH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 00:11, Steve Barnes  wrote:
> 
> What can be done?

I think there’s another option (in addition to improving SyntaxError, not 
instead of it):

Add a defancier module to the stdlib. It has functions that take some text and 
turn smart quotes into plain ASCII quotes, dashes and minuses into ASCII 
hyphens, etc., or just detect them and produce useful objects and/or text. And 
it’s a runnable module that can either lint or fix source code.

Then instead of telling people who get this SyntaxError “Use a proper editor, 
and all the code you wrote so far has to be rewritten or fixed manually, and 
that’ll show you”, we can tell them “Use a proper editor in the future, but 
meanwhile you can fix your existing script with `python -m defancier -f 
script.py`“.

And a simple IDE or editor mode that doesn’t want to come up with something 
better could run defancier on SyntaxError or on open or whenever and show the 
output in a nice way and offer a single-click fix.

There’s nothing in the stdlib quite like this, but textwrap, tabnanny, 2to3, 
etc. are vaguely similar precedents.

And it seems like the kind of thing that will evolve on about the right scale 
for the stdlib—new problems to add to the list come up about once a decade, not 
every few months or anything.

The place I’d _really_ like this is Pythonista, which does an admirable job 
fighting iOS text input for me, but it’s not so helpful for fixing up pasted 
code. (And needless to say, I can’t just get a better editor/IDE; it’s by far 
the best option for the platform.)

(By the way, the reason I used -f rather than —fix is that I can’t figure out 
how to get the iPhone Mail.app to not replace double hyphens with an em-dash, 
or even how to fix it when it does. All of the other fancifier stuff can be 
worked around pretty easily, but apparently not that one…)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T6JPAQWP3P3IJSGGZWMDPBKPFUE6LQJ2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 03:47, Ned Batchelder  wrote:
> 
> On 5/10/20 3:09 AM, Steve Barnes wrote:
>> Change the error message “SyntaxError: invalid character in identifier” to 
>> include which character and it’s Unicode value so that it becomes  
>> “SyntaxError: invalid character 0x201c “  in identifier” – this is almost 
>> certainly the easiest change and fits well with explicit is better than 
>> implicit but still leaves it to the user to correct the erroneous input 
>> (which could be argued is both good and bad).
> 
> Or change it to, "SyntaxError, only plain quotes can be used: you have 0x201c 
> which is a fancy quote" (or something).  We have a specific SyntaxError 
> message for print-without-parens, we should be able to do this also.

Can the error message actually include the Unicode character itself? A novice 
isn’t going to know what U+201c means, they may not be entirely sure what fancy 
quote means or how to search for it, but they will know what “ means and can 
search for it by just copying and pasting from the error message to the Find 
box in their editor.

(I think we do include Unicode characters in other error messages when they 
come directly from the user’s input text. For example, if I try to 2+Spám(), 
the error message will have á in the string.)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K5QQ64GF2YVUHWRQ5LNCDRCN5VA6OZOZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 11:09, Christopher Barker  wrote:

Is there any way you can fix the reply quoting on your mail client, or manually 
work around it? I keep reading paragraphs and saying “why is he saying the same 
thing I said” only to realize that you’re not, that’s just a quote from me that 
isn’t marked, up until the last line where it isn’t…

> On Sat, May 9, 2020 at 9:11 PM Andrew Barnert  wrote:
> 
> > That’s no more of a problem for a list slice view than for any of the 
> > existing views. The simplest way to implement a view is to keep a reference 
> > to the underlying object and delegate to it, which is effectively what the 
> > dict views do.
> 
> Fair enough. Though you still could get potentially surprising behavior if 
> the original sequence's length is changed.

I don’t think it’s surprising. When you go out of your way to ask for a dynamic 
view instead of the default snapshot copy, and then you change the list, you’d 
expect the view to change.

If you don’t keep views around, because you’re only using them for more 
efficient one-shot iteration, you might never think about that, but then you’ll 
never notice it to be surprised by it. The dynamic behavior of dict views 
presumably hasn’t ever surprised you in the 12 years it’s worked that way.

> And you probably don't want to lock the "host" anyway -- that could be very 
> confusing if the view is kept all be somewhere far from the code trying to 
> change the sequence. 

Yes. I think memoryview’s locking behavior is a special case, not something 
we’d want to emulate here. I’m guessing many people just never use memoryview 
at all, but when you do, you’re generally thinking about raw buffers rather 
than abstract behavior. (It’s right there in the name…) And when you need 
something more featureful than an invisible hard lock on the host, it’s time 
for numpy. :)

> I'm still a bit confused about what a dict.* view actually is

The docs explain it reasonably well. See 
https://docs.python.org/3/glossary.html#term-dictionary-view for the basic 
idea,  https://docs.python.org/3/library/stdtypes.html#dict-views for the 
details on the concrete types, and I think the relevant ABCs and data model 
entries are linked from there.

> -- for instance, a dict_keys object pretty much acts like a set, but it isn't 
> a subclass of set, and it has an isdisjoint() method, but not .union or any 
> of the other set methods. But it does have what at a glance looks like pretty 
> complete set of dunders

The point of collections.abc.Set, and ABCs jn general, and the whole concept of 
protocols, is that the set protocol can be implemented by different concrete 
types—set, frozenset, dict_keys, third-party types like 
sortedcontainers.SortedSet or pyobjc.Foundation.NSSet, etc.—that are generally 
completely unrelated to each other, and implemented in different ways—a 
dict_keys is a link to the keys table in a dict somewhere, a set or frozenset 
has its own hash table, a SortedSet has a wide-B-tree-like structure, an NSSet 
is a proxy to an ObjC object, etc. if they all had to be subclasses of set, 
they’d be carrying around a set’s hash table but never using it; they’d have to 
be careful to override every method to make sure it never accidentally got used 
(and what would frozenset or dict_keys override add with?), etc.

And if you look at the ABC, union isn’t part of the protocol, but __or__ is, 
and so on.

> Anyway, a Sequence view is simpler, because it could probably simply be an 
> immutable sequence -- not much need for contemplating every bit of the API.

It’s really the same thing, it’s just the Sequence protocol rather than the Set 
protocol.

If anything, it’s _less_ simple, because for sequences you have to decide 
whether indexing should work with negative indices, extended slices, etc., 
which the protocol is silent about. But the answer there is pretty easy—unless 
there’s a good reason not to support those things, you want to support them. 
(The only open question is when you’re designing a sequence that you expect to 
be subclassed, but I don’t think we’re designing for subclassing here.)

> I do see a possible objection here though. Making a small view of a large 
> sequence would keep that sequence alive, which could be a memory issue. Which 
> is one reason why sliced don't do that by default.

Yes. When you just want to iterate something once, non-lazily, you don’t care 
whether it’s a view of a snapshot, but when you want to keep it around, you do 
care, and you have to decide which one you want. So we certainly can’t change 
the default; that would be a huge but subtle change that would break all kinds 
of code.

But I don’t think it’s a problem for offering an alternative that people have 
to explicitly ask for.

Also, notice that this is true for all of the existing views, and none of them 
try to be un-featureful to avoid it.

> And it could simply be a buyer beware issue. But the more featureful you make 
> a view, the

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Andrew Barnert via Python-ideas

On May 10, 2020, at 02:42, Alex Hall  wrote:
> 
> - Handling negative indices for sequences (is there any reason we don't have 
> that now?)

Presumably partly just to keep it minimal and simple. Itertools is all about 
transforming iterables into other iterables in as generic a way as possible. 
None of the other functions do anything special if given a more fully-featured 
iterable.

But also, negative indexing isn’t actually part of the Sequence protocol. (You 
don’t get negative indexes for free by inheriting Sequence as a mixin, nor is 
it ensured by testing isinstance with Sequence as an ABC.) It’s part of the 
extra stuff that list and the other builtin sequences happen to do. You didn’t 
suggest allowing negative islicing on set even though it could just as easily 
be implemented there, because you don’t expect negative indexing as part of the 
Set protocol (or the Sized Iterable protocol); you did expect it as part of the 
Sequence protocol, but Python’s model disagrees.

Maybe practicality beats purity here, and islice should take negative indices 
on any Sequence, or even Sized, input, even though that makes it different from 
other itertools functions, and ignores the fact that it could be simulating 
negative indexing on some types where it’s meaningless. But how often have you 
wanted to call islice with a negative index? How horrible is the workaround you 
had to write instead? I suspect that it’s already rare enough of a problem that 
it’s not worth it, and that any form of this proposal would make it even rarer, 
but I could be wrong.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZGHJJJP43VZI4ZG7PRTIH3GJGTXANJK6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sanitize filename (path part)

2020-05-09 Thread Andrew Barnert via Python-ideas

On May 9, 2020, at 17:35, Steve Jorgensen  wrote:
> 
> I believe the Python standard library should include a means of sanitizing a 
> filesystem entry, and this should not be something requiring a 3rd party 
> package.
> 
> One of reasons I think this should be in the standard lib is because that 
> provides a common, simple means for code reviewers and static analysis 
> services such as Veracode to recognize that a value is sanitized in an 
> accepted manner.

This does seem like a good idea. People who do this themselves get it wrong all 
the time, occasionally with disastrous consequences, so if Python can solve 
that, that would be great.

But, at least historically, this has been more complicated than what you’re 
suggesting here. For example, don’t you have to catch things like directories 
named “Con” or files whose 8.3 representation has “CON” as the 8 part? I don’t 
think you can hang an entire Windows system by abusing those anymore, but you 
can still produce filenames that some APIs, and some tools (possibly including 
Explorer, cmd, powershell, Cygwin, mingw/native shells, Python itself…) can’t 
access (or can only access if the user manually specified a \\.\ absolute path, 
or whatever).

Is there an established algorithm/rule that lots of people in the industry 
trust that Python can just reference, instead of having to research or invent 
it? Because otherwise, we run the risk of making things worse instead of better.

> What I am envisioning is a function (presumably in `os.path` with a signature 
> roughly like
> {{{
> sanitizepart(name, permissive=False, mode=ESCAPE, system=None)
> }}}

Maybe it would make more sense to put this in pathlib. Then you construct a 
PurePath of the appropriate type, and call sanitize() on it (maybe with a flag 
that ensures that it’s a single path component if you expected it to be one).

I think some, but not all, of this logic already exists in pathlib.

> When `permissive` is `False`, characters that are generally unsafe are 
> rejected. When `permissive` is `True`, only path separator characters are 
> rejected. Generally unsafe characters besides path separators would include 
> things like a leading ".", any non-printing character, any wildcard, piping 
> and redirection characters, etc.

I think neither of these is what I’d usually want.

I never want to sanitize just pathsep characters without sanitizing all illegal 
characters.

I do often want to sanitize all illegal characters (just \0 and the path sep on 
POSIX, a larger set that I don’t know by heart on Windows).

I don’t think I’ve ever wanted to sanitize the set of potentially-unsafe 
characters you’re proposing here.

I have wanted to sanitize (or pop up an “are you sure?” dialog, etc.) a wider 
range of potentially confusing characters. For example, newlines or Unicode 
separators can be very confusing in filenames. I’ve used one of those 
“potentially misleading URL” libs for this even though files and URLs aren’t 
quite the same and it was definitely overzealous, but if I’m not really 
confident that someone has thought through the details and widely vetted them, 
I’d rather have overzealous than underzealous for something like this.

Meanwhile, on POSIX, it’s actually bytes rather than characters that are 
illegal. Any character that, in the filesystem’s encoding, would have a \0 or 
\x2f is therefore illegal. Of course in UTF-8, the only such characters are NUL 
and /, so in scripts I write for my own use on my own systems where I know all 
the filesystems are UTF-8 I don’t worry about this  But mething meant for 
hardening/verification tools seems like it needs to meet a higher standard and 
work on more varied systems. And I don’t know how you could even apply the 
right rule without knowing what the file system encoding is (which means you 
need the full path, not just the component to be checked) or requiring bytes 
rather than str (but then it doesn’t work for Windows, and resolving that whole 
mess gets extra fun, and even on POSIX it’s a lot less common to use).

Speaking of encodings and Windows, isn’t any character not in the user’s OEM 
code page likely to be confusing? Sure, it’ll work with other Python 3.8 
scripts, but it’ll crash or do the wrong thing or display mojibake when used 
with lots of other tools.

> The `mode` argument indicates what to do with unacceptable characters. Escape 
> them (`ESCAPE`), omit them (`OMIT`) or raise an exception (`RAISE`).

What’s the exception, and what attributes does it have? Usually I don’t care 
too much as long as the traceback/log entry/whatever is good enough for 
debugging, but for this function, I think I’d often want to be able to 
programmatically access the character(s) that triggered the error so I can tell 
the user. Especially if the rule isn’t a fixed, well-known one that you can 
describe the way Windows Explorer does when you try to use an illegal character.

> This could also double as an escape character argument

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-09 Thread Andrew Barnert via Python-ideas

On May 9, 2020, at 19:43, Christopher Barker  wrote:
> 
> On Sat, May 9, 2020 at 1:03 PM Andrew Barnert  wrote:
> > https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst
> 
> I haven’t read the whole thing yet, but one thing immediately jumped out at 
> me:
> 
> > and methods on containers, such as dict.keys return iterators in Python 3, 
> 
> No they don’t. They return views—objects that are collections in their own 
> right (in particular, they’re not one-shot; they can be iterated over and 
> over) but just delegate to another object rather than storing the data.
> 
> Thanks -- that's that kind of thing that led me to say that this is probably 
> not ready for a PEP.
> 
> but I don't think that invalidates the idea at all -- there is debate about 
> what an "islice" should return, but an iterable view would be a good option.

I don’t think it invalidates the basic idea at all, just that it suggests the 
design should be different.

Originally, dict returned lists for keys, values, and items. In 2.2, iterator 
variants were added. In 3.0, the list and iterator variants were both replaced 
with view versions, which were enough of an improvement that they were 
backported to 2.x. Because a view does cover almost all of the uses of both a 
sequence copy and an iterator. And I think the same is true here.

> I'm inclined to think that it would be a bad idea to have it return a full 
> sequence view object, and not sure it should do anything other than be 
> iterable.

Why? What’s the downside to being able to do more with them for the same 
performance cost and only a little more up-front design work?

> > And this is important here, because a view is what you ideally _want_. The 
> > reason range, key view, etc. are views rather than iterators isn’t that 
> > it’s easier to implement or explain or anything, it’s that it’s a little 
> > harder to implement and explain but so much more useful that it’s worth it. 
> > It’s something people take advantage of all the time in real code.
> 
> Maybe -- but "all the time?" I'd vernture to say that absolutiely the most 
> comon thing done with, e.g. dict.keys() is to iterate over it.

Really? When I just want to iterate over a dict’s keys, I iterate the dict 
itself. 

> > For prior art specifically on slicing as a view, rather than just views in 
> > general, see memoryview (which only works on buffers, not all sequences) 
> > and NumPy (which is weird in many ways, but people rely on slicing giving 
> > you a storage-sharing view)
> 
> I am a long-time numpy user, and yes, I very much take advantage of the 
> memory sharing view.
> 
> But I do not think that that would be a good idea for the standard libary. 
> numpy slices return a full-fledged numpy array, which shares a data view with 
> the it's "host" -- this is really helpful for performance reasons -- moving 
> large blocks of data around is expensive, but it's also pretty confusing. And 
> it would be a lot more problematic with, e.g. lists, as the underlying buffer 
> can be reallocated -- numpy arrays are mutable, but not re-sizable, once 
> you've made one its data buffer does not change.

That’s no more of a problem for a list slice view than for any of the existing 
views. The simplest way to implement a view is to keep a reference to the 
underlying object and delegate to it, which is effectively what the dict views 
do.

(Well, did from 2.x to 3.5. The dict improvements in 3.6 opened up an 
optimization opportunity, because in the split layout a dict is effectively a 
wrapper around a keys view and a separate table, so the keys view can refer 
directly to that thing that already exists. But that isn’t relevant here.)

(You _could_ instead refuse to allow expanding a sequence when there’s a live 
view, as bytearray does with memoryview, but I don’t think that’s necessary 
here. It’s only needed there a consequence of the fact that the buffer protocol 
is provided in C rather than in Python. For a slice view, it would just make 
things more complicated and less functional for no good reason.)

> > But just replacing islice is a much simpler task (mainly because the input 
> > has to be a sequence and the output is always a sequence, so the only 
> > complexity that arises is whether you want to allow mutable views into 
> > mutable sequences), and it may well be useful on its own.
> 
> Agreed. And while yes, dict_keys and friends are not JUST iterartors, they 
> also aren't very functional views, either. They are not sequences, 

That’s not true. They are very functional—as functional as reasonably makes 
sense. The only reason they’re not Sequences is that they’re views on dicts, so 
indexing makes little sense, but set operations do—and they are in fact Sets. 
(Except for values.)

> certainly not mutabe sequences.

Well, yes, but mutating a dict through its views wouldn’t make sense in the 
first place:

>>> d = {1: 2}
>>> k = dict.keys()
>>> k |= 3

You’ve told it to

[Python-ideas] Re: Equality between some of the indexed collections

2020-05-09 Thread Andrew Barnert via Python-ideas

On May 9, 2020, at 13:24, Dominik Vilsmeier  wrote:
> 
> 
>> On 09.05.20 22:16, Andrew Barnert wrote:
>>> 
>> There’s an obvious use for the .all, but do you ever have a use for the 
>> elementwise itself? When do you need to iterate all the individual 
>> comparisons? (In numpy, an array of bools has all kinds of uses, starting 
>> with indexing or selecting with it, but I don’t think any of them are doable 
>> here.)
> I probably took too much inspiration from Numpy :-) Also I thought it
> would nicely fit with the builtin `all` and `any`, but you are right,
> there's probably not much use for the elementwise iterator itself. So
> one could use `elementwise` as a namespace for `elementwise.all(chars)
> == string` and `elementwise.any(chars) == string` which automatically
> reduce the elementwise comparisons and the former also performs a length
> check prior to that. This would still leave the option of having
> `elementwise(x) == y` return an iterator without reducing (if desired).

But do you have any use for the .any? Again, it’s useful in NumPy, but would 
any of those uses translate?

If you’re never going to use elementwise.any, and you’re never going to use 
elementwise itself, having elementwise.all rather than just making that the 
callable is just making the useful bit a little harder to access. And it’s 
definitely complicating the implementation, too. If you have a use for the 
other features, that may easily be worth it, but if you don’t, why bother?

I took my lexicompare, stripped out the dependency on other helpers in my 
toolbox (which meant rewriting < in a way that might be a little slower; I 
haven’t tested) and the YAGNI stuff (like trying to be “view-ready” even though 
I never finished my views library), and posted it at 
https://github.com/abarnert/lexicompare (no promises that it’s stdlib-ready 
as-is, of course, but I think it’s at least a useful comparison point here). 
It’s pretty hard to beat this for simplicity:
@total_ordering
class _Smallest:
def __lt__(self, other):
return True

@total_ordering
class lexicompare:
def __new__(cls, it):
self = super(lexicompare, cls).__new__(cls)
self.it = it
return self
def __eq__(self, other):
return all(x==y for x,y in zip_longest(self.it, other, 
fillvalue=object()))
def __lt__(self, other):
for x, y in zip_longest(self.it, other, fillvalue=_Smallest()):
if x < y: return True
elif x < y: return False
return False
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JUG6XZMEGTRYWBUKUVAOXN64FTAJGTX7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Equality between some of the indexed collections

2020-05-09 Thread Andrew Barnert via Python-ideas

On May 9, 2020, at 02:58, Dominik Vilsmeier  wrote:
> 
> 
> Initially I assumed that the reason for this new functionality was
> concerned with cases where the types of two objects are not precisely
> known and hence instead of converting them to a common type such as
> list, a direct elementwise comparison is preferable (that's probably
> uncommon though). Instead in the case where two objects are known to
> have different types but nevertheless need to be compared
> element-by-element, the performance argument makes sense of course.
> 
> So as a practical step forward, what about providing a wrapper type
> which performs all operations elementwise on the operands. So for example:
> 
> if all(elementwise(chars) == string):
> ...
> 
> Here the `elementwise(chars) == string` part returns a generator which
> performs the `==` comparison element-by-element.
> 
> This doesn't perform any length checks yet, so as a bonus one could add
> an `all` property:
> 
> if elementwise(chars).all == string:
> ...

There’s an obvious use for the .all, but do you ever have a use for the 
elementwise itself? When do you need to iterate all the individual comparisons? 
(In numpy, an array of bools has all kinds of uses, starting with indexing or 
selecting with it, but I don’t think any of them are doable here.)

And obviously this would be a lot simpler if it was just the all object rather 
than the elementwise object—and even a little simpler to use:

element_compare(chars) == string

(In fact, I think someone submitted effectively that under a different name for 
more-itertools and it was rejected because it seemed really useful but 
more-itertools didn’t seem like the right place for it. I have a similar 
“lexicompare” in my toolbox, but it has extra options that YAGNI. Anyway, even 
if I’m remembering right, you probably don’t need to dig up the more-itertools 
PR because it’s easy enough to redo from scratch.)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AF3Z63YYQQVWCV3DZQJMKFNKO2G5AXKG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-09 Thread Andrew Barnert via Python-ideas

On May 9, 2020, at 12:38, Christopher Barker  wrote:
> 
> https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst

I haven’t read the whole thing yet, but one thing immediately jumped out at me:

> and methods on containers, such as dict.keys return iterators in Python 3, 

No they don’t. They return views—objects that are collections in their own 
right (in particular, they’re not one-shot; they can be iterated over and over) 
but just delegate to another object rather than storing the data.

People also commonly say that range is an iterator instead of a function that 
returns a list in Python 3, and that’s wrong for the same reason.

And this is important here, because a view is what you ideally _want_. The 
reason range, key view, etc. are views rather than iterators isn’t that it’s 
easier to implement or explain or anything, it’s that it’s a little harder to 
implement and explain but so much more useful that it’s worth it. It’s 
something people take advantage of all the time in real code.

And this is pretty easy to implement. I have a quick and dirty version at 
https://github.com/abarnert/slices, but I think I may have a better version 
somewhere with more unit tests.

For prior art specifically on slicing as a view, rather than just views in 
general, see memoryview (which only works on buffers, not all sequences) and 
NumPy (which is weird in many ways, but people rely on slicing giving you a 
storage-sharing view)

The reason I never proposed this for the stdlib (even though that would allow 
adding methods directly onto the builtin container types, as your proposal 
does) is that I always want to build a _complete_ view library, with 
replacements for map, zip, enumerate, all of itertools, etc., and with enough 
cleverness to present exactly as much functionality as is possible. But just 
replacing islice is a much simpler task (mainly because the input has to be a 
sequence and the output is always a sequence, so the only complexity that 
arises is whether you want to allow mutable views into mutable sequences), and 
it may well be useful on its own.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YQKKS4RADWU3QOFWFUU6PHS3ZU523T7P/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: islice with actual slices

2020-05-09 Thread Andrew Barnert via Python-ideas

On May 9, 2020, at 02:12, Ram Rachum  wrote:
> 
> 
> Here's an idea I've had. How about instead of this: 
> 
> itertools.islice(iterable, 7, 20)
> 
> We'll just have: 
> 
> itertools.islice(iterable)[7:20]

I’ve actually built this.[1] From my experience, it feels clever at first, but 
it can get confusing.

The problem is that if you slice twice, or slice after nexting, you can’t get a 
feel for what the remaining values should be unless you work it through. Of 
course the exactly same thing is true with using islice twice today, but you 
don’t _expect_ that to be comprehensible in terms of slicing the original 
iterable twice, while with slice notation, you do. Or at least I do; maybe 
that’s just me.

And meanwhile, even though the simple uses aren’t confusing, I’ve never had any 
code where it made things nicer enough that it seemed worth reaching into the 
toolbox. But again, maybe that’s just me.

If you want to play with this and can’t implement it yourself easily, I could 
dig up my implementation. But it’s pretty easy (especially if you don’t try to 
optimize and just have __getitem__ return a new islice around self).

—-

[1] Actually, I built an incomplete viewtools (a replacement for itertools plus 
zip, map, etc. that gives you views that are reusable iterables and forward as 
much input behavior as possible—so map(lambda i: i*2, range(10)) is a sequence, 
while filter(lambda i: i%2, range(10)) is not a sequence but it is reversible, 
and so on) and then extracted and simplified the vslice because I thought it 
might be useful without the views stuff. (I also extracted and simplified it in 
a different way, as view slices that only work on sequences, and that actually 
did turn out to be occasionally useful.)___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CQ43LF5UICMYNNB43JJM2CXOILUOCSPC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip() as a class constructor (meta) [was: ... Length-Checking To zip]

2020-05-09 Thread Andrew Barnert via Python-ideas

On May 9, 2020, at 03:46, Chris Angelico  wrote:
> 
> But ultimately, a generator function is very similar to a class with a
> __next__ method. When you call it, you get back a state object that
> you can ping for the next value. That's really all that matters.

Well, it’s very similar to a class with __next__, send, throw, and close 
methods. But that doesn’t really change your point.

For a different angle on this: What If Python 3.10 changed things so that every 
generator function actually did define a new class (maybe even accessible as a 
member of the function)? What would break? You could make inspect.isgenerator() 
continue to work, and provide the same internal attributes documented in the 
inspect module. So only code that depends on type(gen()) is types.GeneratorType 
would break (and there probably is very little of that—not even throwaway REPL 
code).

Also: a generator isn’t actually a way of defining a class, but it’s a way of 
defining a factory for objects that meet a certain API, and Python goes out of 
its way to hide that distinction wherever possible (not just for generators, 
but in general). The only meaningful thing that’s different between a generator 
function and a generator class is that the author of the function doesn’t 
directly write the __next__ (and send, close, etc.) code, but instead writes 
code that defines their behavior implicitly. And that’s obviously just an 
implementation detail, and it isn’t that much different from the fact that the 
author of a @dataclsss doesn’t directly write the __init__, __repr__, etc.

So you’re right, from outside, it really doesn’t matter.

> I
> think the C implementations tend to be classes but the Python ones
> tend to be generators - possibly because a generator function is way
> easier to write in Python, but maybe the advantage isn't as strong in
> C.

It’s not just not as strong, it runs in the opposite direction. In fact, it’s 
impossible to write generator functions in C. There’s no way to yield control 
in a C function. (Even if you build a coro library around setjmp, or use C++20 
coros, it wouldn’t help you yield back into CPython’s ceval loop.) A generator 
object is basically just a wrapper around an interpreter frame and its 
bytecode; there’s no way to exploit that from C. There are a few shortcuts to 
writing an iterator (e.g., when you have a raw array, implement the old-style 
sequence protocol, want to delegate to a member, or can steal another type’s 
implementation as frozenset does with set), but a generator function isn’t one 
of them.

(If you’re curious how Cython compiles generators, it’s worth looking at what 
it produces—but doing the same thing in raw C would not be a shortcut to 
writing a generator class.)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OGNVZK6SUCE2YUMM4IUHHD4TG76A7CYX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-09 Thread Andrew Barnert via Python-ideas

> On May 9, 2020, at 04:30, Alex Hall  wrote:
> 
>> On Fri, May 8, 2020 at 11:22 PM Andrew Barnert via Python-ideas 
>>  wrote:
> 
>> Trying to make it a flag (which will always be passed a constant value) is a 
>> clever way to try to get the best of both worlds—and so is the 
>> chain.from_iterable style.
> 
> At this point it sounds like you're saying that zip(..., strict=True) and 
> zip.strict(...) are equally bad.

You’re right, it did sound like that, and I don’t mean that. Sorry.

zip.strict has _some_ of the same problems as zip(strict=True), but definitely 
not _all_ of them. And I definitely prefer zip.strict to the flag.

At the time I wrote this (I don’t know why it took a few days to get 
delivered…), zip.strict had come up the first time and been roundly shouted 
down, and it seemed like.nobody but me (and the proposer, of course) had found 
it at all acceptable, and I was trying to make the point that if people don’t 
like zip.strict, the same things and more apply to passing an always-constant 
flag, so it should be even more acceptable.

Then. over the last few days, a bunch of people came around on zip.strict. And 
that seems to be at least in part because people came up with better arguments 
than the first time around. (For example, I forget who it was that pointed out 
that you don’t really have to start thinking of zip as a class and zip.strict 
as an alternate constructor, because plenty of people don’t realize that’s true 
for chain.from_iterable and they still have no more problem using it than they 
do for datetime.now.)

So now, rather than it being a +0 for me and a distant second choice behind an 
itertools function, I think I’m pretty close to evenly torn between the two.

I do think that if we add zip.strict, we should also probably add zip.longest, 
not just think about maybe adding it some day. And it might even be worth 
adding zip.shortest, even if we have no intention of ever eliminating zip() 
itself or changing it to mean zip.strict. But I don’t have good arguments for 
these; I’ll have to think about it a bit more to explain why I think 
consistency easily trumps the costs for this variant of the proposal but 
probably fails for other variants.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RAFDWYYUIDOLCQ4M7HS35DZL56LR32YX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Equality between some of the indexed collections

2020-05-08 Thread Andrew Barnert via Python-ideas

On May 8, 2020, at 20:36, Dan Sommers <2qdxy4rzwzuui...@potatochowder.com> 
wrote:
> 
> On Fri, 8 May 2020 17:40:31 -0700
> Andrew Barnert via Python-ideas  wrote:
> 
>> So, the OP is right that (1,2,3)==[1,2,3] would sometimes be handy,
>> the opponents are right that it would often be misleading, and the
>> question isn’t which one is right ...
> 
> That's a good summary.  Thank you.  :-)
> 
>> [1] If anyone still wants to argue that using a tuple as a hashable
>> sequence instead of an anonymous struct is wrong, how would you change
>> this excerpt of code:
>> 
>>memomean = memoize(mean, key=tuple)
>>def player_stats(player):
>># …
>>… = memomean(player.scores) …
>># …
>> 
>> Player.scores is a list of ints, and a new one is appended after each
>> match, so a list is clearly the right thing. But you can’t use a list
>> as a cache key. You need a hashable sequence of the same values. And
>> the way to spell that in Python is tuple.
> 
> Very clever.  

I don’t think it’s particularly clever. And that’s fine—using common idioms 
usually is one of the least clever ways to do something out of the infinite 
number of possible ways. Because being intuitively the one obvious way tends to 
be important to becoming an idiom, and it tends to run counter to being clever. 
(Being concise, using well-tested code, and being efficient are also often 
important, but being clever doesn’t automatically give you any of those.)

> Then again, it wouldn't be python-ideas if it were that
> simple!  "hashable sequence of the same values" is too strict.  I think
> all memoize needs is a key function such that if x != y, then key(x) !=
> key(y).

Well, it does have to be hashable. (Unless you’re proposing to also replace the 
dict with an alist or something?) I suppose it only needs to be a hashable 
_encoding_ of a sequence of the same values, but surely the simplest encoding 
of a sequence is the sequence itself, so, unless “hashable sequence” is 
impossible (which it obviously isn’t), who cares?

>def key(scores):
>','.join(str(-score * 42) for score in scores)

This is still a sequence. If you really want to get clever, why not:

def key(scores):
return sum(prime**score for prime, score in zip(calcprimes(), scores))

But this just demonstrates why you don’t really want to get clever. It’s more 
code to write, read, and debug than tuple, easier to get wrong, harder to 
understand, and almost certainly slower, and the only advantage is that it 
deliberately avoids meeting a requirement that we technically didn’t need but 
got for free.

> Oh, wait, even that's too strict.  All memoize really needs is if
> mean(x) != mean(y), then key(x) != key(y):
> 
>memomean = memoize(mean, key=mean)
>def player_stats(player):
># …
>… = memomean(player.scores) …
># …

Well, it seems pretty unlikely that calculating the mean to use it as a cache 
key will be more efficient than just calculating the mean, but hey, if you’ve 
got benchmarks, benchmarks always win. :)

(In fact, I predicted that memoizing here would be a waste of time in the first 
place, because the only players likely to have equal score lists to earlier 
players would be the ones with really short lists—but someone wanted to try it 
anyway, and he was able to show that it did speed up the script on our test 
data set by something like 10%. Not nearly as much as he’d hoped, but still 
enough that it was hard to argue against keeping it.)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QLEDJ7XBE3EHG2C3J2QEFOWROSTMSH4C/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-08 Thread Andrew Barnert via Python-ideas

On May 4, 2020, at 10:44, Steve Barnes  wrote:
> And "equal" doesn't say what it's equal.
> 
> What we need is a word that means "same length", much as "shorter" and 
> "longer" are about length.
> 
> There's "coextensive", but that'll probably get a -1.

If “equal” is bad, “coextensive” is much worse. “Equal” is arguably ambiguous 
between “same length” and “same values”, but “coextensive” usually means “same 
values”.

“The County shall be coextensive with the City of San Francisco” doesn’t mean 
that it’s 49.81 square miles, it means it consists of the exact same 49.81 
square miles as the city. “The golden age of Dutch culture was roughly 
coextensive with the Netherlands’ reign as a world power…” doesn’t mean it was 
roughly 67 years, it means it was roughly the same 67 years from 1585 to 
1652.[1] “Consciousness and knowledge are coextensive” means that you know the 
things you’re conscious of. And in math[2], a popular example in undergrad 
textbooks[3] is that (Z/7Z, +) and (Z/7Z, *) are coextensive but still distinct 
groups. The most popular formulation of the axiom of reducibility in early 
predicative set theory was “to each propositional function there corresponds a 
coextensive predicative function”. Even in measure theory, it seems to always 
mean “same extension”, not “same extent”.

So, this would be a great name for the function in the other thread about 
comparing lists and tuples as equal, but it’s not a great name here.

Some dictionaries do give “commensurate” or similar as a secondary[4] meaning, 
but at best that would mean it’s still ambiguous.

—-

[1] And here I thought it was 1989 until whenever Guido left.

[2] I didn’t even remember that it was used in math until I used the word in 
its normal English sense and one of the other Steves accused me or resorting to 
mathematical jargon—but after that, I did some searching, and I was wrong, and 
it actually is reasonably common.

[3] Seriously, I found the exact same example in three apparently unrelated 
textbooks. Which is pretty odd. 

[4] Or even later, after giving the same spatial boundaries, then the same 
temporal boundaries, then the math/logic definition, but I’m lumping those all 
together as one sense because they’re coextensive if spacetime Is topological. 
:)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/W3RLUQ2GUQX4I5GV6X7UUTLQ7QPJ6ZA2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: General methods

2020-05-08 Thread Andrew Barnert via Python-ideas

On May 8, 2020, at 15:44, Steven D'Aprano  wrote:
> 
> On Fri, May 08, 2020 at 10:46:45PM +0300, Serhiy Storchaka wrote:
> 
>> I propose to add the METH_GENERAL flag, which is applicable to methods 
>> as METH_CLASS and METH_STATIC (and is mutually incompatible with them). 
>> If it is set, the check for the type of self will be omitted, and you 
>> can pass an arbitrary object as the first argument of the unbound method.
> 
> Does this effect code written in Python? As I understand, in Python 
> code, unbound methods are just plain old functions, and there is no 
> type-checking done on `self`.
> 
>py> class C:
>... def method(self, arg):
>... return (self,)
>...
>py> C.method(999, None)
>(999,)
> 
> So I think your proposal will only affect builtin methods written in C. 
> Is that correct?

Maybe the best way to see it is this:

For classes implemented in Python, you have to go out of your way to typecheck 
self. 

For classes implemented in C, you have to go out of your way to _not_ typecheck 
self.

It’s probably way too big of a change to make them consistent at this point, so 
Serhiy is just proposing a way to make it a lot easier for C methods to act 
like Python ones when you need them to.

And, given that he has some solid use cases, it’s hard to see any problem with 
that.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JKAYR4SWRM7HTJS3RN775R2I4K3B75XQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Equality between some of the indexed collections

2020-05-08 Thread Andrew Barnert via Python-ideas

On May 6, 2020, at 05:22, Richard Damon wrote:
>
> In my mind, tuples and lists seem very different concepts, that just
> happen to work similarly at a low level (and because of that, are
> sometimes 'misused' as each other because it happens to 'work').

I think this thread has gotten off track, and this is really the key issue here.

If someone wants this proposal, it’s because they believe it’s _not_ a misuse
to use a tuple as a frozen list (or a list as a mutable tuple).

If someone doesn’t want this proposal, the most likely reason (although
admittedly there are others) is because they believe it _is_ a misuse to use a
tuple as a frozen list.

It’s not always a misuse; it’s sometimes perfectly idiomatic to use a tuple as
an immutable hashable sequence. It doesn’t just happen to 'work', it works, for
principled reasons (tuple is a Sequence), and this is a good thing.[1]

It’s just that it’s _also_ common (probably a lot more common, but even that
isn’t necessary) to use it as an anonymous struct.

So, the OP is right that (1,2,3)==[1,2,3] would sometimes be handy, the
opponents are right that it would often be misleading, and the question isn’t
which one is right, it’s just how often is often. And the answer is obviously:
often enough that it can’t be ignored. And that’s all that matters here.

And that’s why tuple is different from frozenset. Very few uses of frozenset
are as something other than a frozen set, so it’s almost never misleading that
frozensets equal sets; plenty of tuples aren’t frozen lists, so it would often
be misleading if tuples equaled lists.

—-

[1] If anyone still wants to argue that using a tuple as a hashable sequence
instead of an anonymous struct is wrong, how would you change this excerpt of
code:

memomean = memoize(mean, key=tuple)
def player_stats(player):
# …
… = memomean(player.scores) …
# …

Player.scores is a list of ints, and a new one is appended after each match, so
a list is clearly the right thing. But you can’t use a list as a cache key. You
need a hashable sequence of the same values. And the way to spell that in
Python is tuple.

And that’s not a design flaw in Python, it’s a feature. (Shimmer is a floor wax
_and_ a dessert topping!) Sure, when you see a tuple, the default first guess
is that it’s an anonymous struct—but when it isn’t, it’s usually so obvious
from context that you don’t even have to think about it. It’s confusing a lot
less often than, say, str, and it’s helpful a lot more often.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/F65FI2QMUOUCD2RVW4APQMNAFALQZFXB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Auto-assign attributes from init arguments

2020-05-08 Thread Andrew Barnert via Python-ideas

On May 4, 2020, at 17:26, Steven D'Aprano  wrote:
> 
> Proposal:
> 
> We should have a mechanism that collects the current function or 
> method's parameters into a dict, similar to the way locals() returns all 
> local variables.
> 
> This mechanism could be a new function,or it could even be a magic local 
> variable inside each function, similar to what is done to make super() 
> work. But for the sake of this discussion, I'll assume it is a function, 
> `parameters()`, without worrying about whether it is a built-in or 
> imported from the `inspect` module.

Some other popular languages have something pretty similar. (And they’re not 
all as horrible as perl $*.) For example, in JavaScript, there’s a magic local 
variable named arguments whose value is (a thing that duck-types as) a list of 
the arguments passed to the current function’s parameters. (Not a dict, but 
that’s just because JS doesn’t have keyword arguments.)

> function spam(x, y) { console.log(arguments) }
> spam(23, 42)
[23, 42]

Whether it’s called arguments or parameters, and whether it’s a magic variable 
or a magic function, are minor bikeshedding issues (which you already raised), 
not serious objections to considering them parallel. And I think all of the 
other differences are either irrelevant, or obviously compelled by differences 
between the languages (e.g., Python doesn’t need a rule for how it’s different 
between the two different kinds of functions, because lambda doesn’t produce a 
different kind of function).

So, I think this counts as a prior-art/cross-language argument for your 
proposal.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JABSTNZJ2D5GMI23FXJD7UAG7QPXVHJK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-08 Thread Andrew Barnert via Python-ideas

On May 5, 2020, at 12:50, Christopher Barker  wrote:
> 
> Another key point is that if you want zip_longest() functionality, you simply 
> can not get it with the builtin zip() -- you are forced to look elsewhere. 
> Whereas most code that might want "strict" behavior will still work, albeit 
> less safely, with the builtin.

I think this is a key point, but I think you’ve got it backward.

You _can_ build zip_longest with zip, and before 2.6, people _did_. (Well, they 
built izip_longest with izip.) I’ve still got my version in an old toolbox. You 
chain a repeat(None) onto each iterable, izip, and you get an infinite iterator 
that you have to read until all(is None). You can just takewhile that into 
exactly the same thing as izip_longest, but unfortunately that’s a lot slower 
than filtering when you iterate, so I had both _longest and _infinite variants, 
and I think I used the latter more even though it was usually less convenient. 
That sounds like a silly way to do it, and it’s certainly easier to get subtly 
wrong than just writing a generator function like the “as if” code in the 
(i)zip_longest docs, but a comment in my code assures me that this is almost 4x 
as fast, and half the speed of a custom C implementation, so I’m pretty sure 
that’s why I did it. And I doubt I’m the only person who solved it that way. In 
fact, I’ll bet I copied it from an ActiveState recipe or a colleague or an open 
source project.

So, most likely, izip_longest wasn’t added because you can’t build it on top of 
izip, but because building it on top of izip is easy to get subtly wrong 
(especially if you need it to be fast—or don’t need it to be fast but micro 
optimize it anyway, for that matter), and often people punt and do something 
clunkier (use _infinite instead of _longest and make the final for loop more 
complicated).

Which is actually a pretty good parallel for the current proposal. You can 
write your own zip_strict on top of zip, and at least a few people do—but, as 
people have shown in this thread, the obvious solution is too slow, the obvious 
fast solution is very easy to get subtly wrong, and often people punt and do 
something clunkier (listify and compare len).

That’s why I’m +1 on this proposal in some form. Assuming zip_strict would be 
useful at least as often as zip_longest (and I’ve been sold on that part, and I 
think most people on all sides of this discussion agree?), it calls out for a 
good official solution. The fact that the ecosystem is different nowadays (pip 
install more-itertools or copying off StackOverflow is a lot simpler, and more 
common, than finding a recipe on ActiveState) does make it a little less 
compelling, but at most that means the official solution should be a docs link 
to more-itertools, still not that we should do nothing.

But that’s also part of the reason I’m -1 on it being a flag. Just like 
zip_longest, it’s a different function, one you shouldn’t think of as being 
built on zip even if it could be. Maybe strict really is needed so much more 
often than longest that “import itertools” is too onerous, but if that’s really 
true, that different function should be another builtin. I think nobody is 
arguing for that, because it’s just obvious that it isn’t needed enough to 
reach the high bar of adding another function to builtins. But that means it 
belongs in itertools.

Trying to make it a flag (which will always be passed a constant value) is a 
clever way to try to get the best of both worlds—and so is the 
chain.from_iterable style. But if either of those really did get the best of 
both worlds and the problems of neither, it would be used all over the place, 
rather than as sparingly as possible. And of course it doesn’t get the best of 
both worlds. A flag is hiding code as data, and it looks misleadingly like the 
much more common uses of flags where you actually do often set the flag with a 
runtime value. It’s harder to type (and autocomplete makes the difference 
worse, not better). It’s a tiny bit harder to read, because you’re adding as 
much meaningless boilerplate (True) as important information (strict). It’s 
increasing the amount of stuff to learn in builtins just as much as another 
function would. And so on. So it’s only worth doing for really special cases, 
like open.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IEMCC3WXEHV2J7DLP7OXWSYATLSC3BBI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding a "once" function to functools

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 09:51, Tom Forbes  wrote:
> 
>> You’ve written an exactly equIvalent to the double-checked locking for 
>> singletons examples that broke Java 1.4 and C++03 and led to us having once 
>> functions in the first place.
>>  … but what about on Jython, or PyPy-STM, or a future GIL-less Python?
> 
> While I truly do appreciate your feedback on this idea, I’m really not clear 
> on your line of reasoning here. What specifically do you propose would be the 
> issue with the *Python* implementation? Are you proposing that under some 
> Python implementations `cache = func()` could be… the result of half a 
> function call? I could buy an issue with some implementations meaning that 
> `cache` still appears as `sentinel` in specific situations, but I feel that 
> would constitute a pretty obvious bug in the implementation that would impact 
> a _lot_ of other multithreaded code rather than a glaring issue with this 
> snippet. Both the issues you’ve referenced valid, but also are rather 
> specific to the languages that they affect. I don’t believe they apply to 
> Python.

But the issues really aren’t specific to C++ and Java. The only reason C#, 
Swift, Go, etc. don’t have the same problem is that their memory models were 
designed from the start to provide a way to do this correctly. Python was not. 
There was an attempt to define a memory model in the 00’s (PEP 583), but it was 
withdrawn.

According to the discussion around that PEP about when you can see 
uninitialized variables (not exactly the same issue, but closely related), 
Jython is safe when they’re globals or instance attributes and you haven’t 
replaced the module or object dict, but otherwise probably not; IronPython is 
probably safe in the same cases and more but nobody’s actually sure. Does that 
sound good enough to dismiss the problem?

> I still think the point stands. With your two-separate-decorators approach 
> you’re paying it on every call. As a general purpose `call_once()` 
> implementation I think the snippet works well, but obviously if you have some 
> very specific use-case where it’s not appropriate - well then you are 
> probably able to write a very specific and suitable decorator.

Being willing to trade safety or portability for speed is sometimes a good 
tradeoff, but that’s the special use case, not the other way around. People who 
don’t know exactly what they need should get something safe and portable.

Plus, there’s still the huge issue with single-threaded programs. It’s not like 
multi-threaded programs are ubiquitous in Python but, e.g., asyncio is some 
rare niche thing that the stdlib doesn’t have to worry about. A bunch of 
coroutines using a once function needs either nothing, or a coro lock; if you 
build a threading lock into the function, they waste time and maybe deadlock 
every 1 startups for no benefit whatsoever. Why is that acceptable for a 
general purpose function?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SSBEI5BD3HNONNSH5RGGPYKZ2LY3DEXR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: is a

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 15:35, Steven D'Aprano  wrote:
> 
> but if it is all functions, then I think you have no choice but to 
> either live with it or shift languages, because the syntax for functions 
> is too deeply baked into Python to change now.

Actually, I’m pretty sure Python could add infix calling without complicating 
the grammar much, or breaking backward compatibility at all. I don’t think it 
*should*, but maybe others would disagree.

The most obvious way to do it is borrowing straight out of Haskell, so this:

x `spam` y

… compiles to exactly the same code as this:

spam(x, y)

That should be a very easy change to the grammar and no change at all to the 
later stages of compiling, so it’s about as simple as any new syntax could be. 
It doesn’t get in the way of anything else to the parser—and, more importantly, 
I don’t think it’s confusable as meaning something else to humans. (Of course 
it would be one extra thing to learn, like any syntax change.) Maybe something 
like $ instead of backticks is better for people with gritty monitors, but no 
point bikeshedding that (or the precedence) unless the basic idea is sound.

Anyway, it’s up to the user to decide which binary functions to infix and which 
to call normally, which sounds like a consenting-adults issue, but… does it 
_ever_ look Pythonic?

For this particular use case:

isa = isinstance

thing `isa` Fruit and not thing `isa` Apple

… honestly, the lack of any parens here makes it seem harder to read, even if 
it is a bit closer to English.

Here’s the best use cases I can come up with:

xs `cross` ys
array([[0,1], [1,1]]) `matrix_power` n
prices `round` 2

These are all things I have written infix in Haskell, and can’t in 
Python/NumPy, so you’d think I’d like the improvement… but if I can’t have real 
operators, I think I want dot-syntax methods with parens instead in Python:

prices.round(2)

And outside of NumPy, the examples seem to just get worse:

with open(path, 'w') as f:
obj `json.dump` f

Of course maybe I’m just failing to imagine good examples.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AQPPHKL4EMFMT5NPB66W4GAFMGE5YYAB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Introduce 100 more built-in exceptions

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 16:32, Steven D'Aprano  wrote:
> 
> On Fri, May 01, 2020 at 12:28:02PM -0700, Andrew Barnert via Python-ideas 
> wrote:
>>> On May 1, 2020, at 09:24, Christopher Barker  wrote:
>>> Maybe it's too late for this, but I would love it if ".errno or similar" 
>>> were more standardized. As it is, every exception may have it's own way to 
>>> find out more about exactly what caused it, and often you are left with 
>>> parsing the message if you really want to know.
>> I don’t think there are many cases where a standardized .errno would 
>> help—and I think most such cases would be better served by separate 
>> exceptions. With OSError, errno was a problem to be fixed, not an ideal 
>> solution to emulate everywhere.
>> You do often need to be able to get more information, and that is a problem, 
>> but I think it usually needs to be specific to each exception, not something 
>> generic.
>> Does code often need to distinguish between an unpacking error and an int 
>> parsing error? If so, you should be able to handle UnpackingError and 
>> IntParsingError, not handle ValueError and check an .errno against some set 
>> of dozens of new builtin int constants. If not, then we shouldn’t change 
>> anything at all.
>> As for parsing the error message, that usually comes up because
>> there’s auxiliary information that you need but that isn’t accessible.
>> For example, in 2.x, to get the filename that failed to open, you had
>> to regex .args[0], and that sucked.
> 
> Why would you parse the error message when you already have the 
> file name?
> 
>   try:
>f = open(filename)
>   except IOError as err:
>print(filename)

   try:
   config = parse_config()
   except IOError as err:
   print(filename)

You can’t get the local variable out of some other function that you called, 
even with frame hacking.

At any rate, it’s a bit silly to relitigate this change. All of the new IOError 
subclasses where a filename is relevant have had a filename attribute since 
3.0, so this problem has been solved for over a decade. If you really prefer 
the 2.x situation where sometimes those exception instances had the filename 
and sometimes not, you’ll need a time machine.

>> It seems like every year or two, someone suggests that we should go
>> through the stdlib and fix all the exceptions to be reasonably
>> distinguishable and to make their relevant information more
>> accessible, and I don’t think anyone ever has a problem with that,
> 
> I do!
> 
> Christopher's proposal of a central registry of error numbers and 
> standardised error messages just adds a lot more work to every core 
> developer for negligible or zero actual real world benefit.

You’re replying to a message saying “errno was a problem to be fixed, not an 
ideal solution to emulate” and likewise having to parse errors. And you’re 
insisting that you disagree because adding errno and standardizing messages so 
they could be parsed would be a problem for maintainers as well as for users. 
Sure, you’re right, but that’s not in any way an argument against Ram’s 
proposal, or against the paragraph you quoted; if anything, it’s an argument 
*for* it.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EESMDX2YO5KAIQQVVSSNKSTKTOYMSNH2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Introduce 100 more built-in exceptions

2020-05-01 Thread Andrew Barnert via Python-ideas

> On May 1, 2020, at 14:34, Christopher Barker  wrote:
> 
> But it seems clear that "doing a big revamp if all the Exceptions and adding 
> alot more subclasses" is not supported. Which doesn't means that a few more 
> expansions wouldn't be excepted. 
> 
> So folks that like this idea may be best served by finding the lowest hanging 
> fruit, and suggesting jsut a few.

I think you’re right. It _might_ be accepted if someone did the work, but it’s 
probably a lot easier to get separate small changes with solid use cases in one 
by one. As long as you’re not being sneaky and pretending like you don’t have a 
master plan, and each of the changes is convincing, I think they’d have a much 
better chance. And there ought to be good use cases for “these builtin parse 
functions should have a .string with the input that failed so you don’t have to 
regex it out of the message” or “I need to distinguish this one kind of 
ValueError from all the other kinds” or whatever; a lot easier to argue for 
those use cases than something abstract and general. 

And almost any way it turns out seems like a win. Even if they all get 
rejected, better to know you were on the wrong track early rather than after a 
hundred hours of work. Or if it turns out to be more work than you expected and 
you get sick of doing it, at least you’ve improved some of the most important 
cases. Or maybe you’d just keep doing it and people just keep saying “fine”. Or 
maybe someone says, “Hold on, another one of these? They’re all good on their 
own, but shouldn’t we have some master plan behind it all?” and then you can 
point back to the master plan you posted in this thread that nobody wanted to 
read at the time, and now they’ll want to read it and start bikeshedding. :)

(By “you” here I don’t mean you, Christopher; I mean Ram, or whoever else wants 
to do all this work.)

By the way:

> Python2 DID have a .message attribute -- I guess I should go look and find 
> documentation for the reasoning behind that, but it does seem like a step 
> backwards to me.

In 2.6 and 2.7, it’s undocumented, and should always be either the same thing 
__str__ returns or the empty string. So, not particularly useful.

I believe it exists as a consequence of the first time someone suggested “let’s 
clean up all the exceptions” but then that cleanup didn’t get started.

It was added in 2.5, along with a planned deprecation of args, and a new rule 
for __str__ (return self.message instead of formatting self.args), and a new 
idiom for how newly-written exception classes should super: don’t pass *args, 
pass a single formatted string; anything worth keeping around for users is 
worth storing in a nicely named attribute the way SyntaxError and IOError 
always have. And Py3000 was going to change all the existing exceptions to use 
that new idiom. But that never happened, and 2.6 and 3.0 basically went back to 
2.4: there’s no mention of message at all, args isn’t going to be deprecated, 
the rule for __str__ is the old one, etc.

There are more custom attributes on more exceptions than there used to be, but 
they seem to mostly have grown on a case by case basis (and mostly on brand new 
exceptions) rather than in one fell swoop. Which implies that you were right 
about the best way to get anything done.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ODSM2MPGCQPPOMB4V5Q2BQFROLTP6KR3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: PEP 618: Add Optional Length-Checking To zip

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 11:19, Brandt Bucher  wrote:
> 
> I have pushed a first draft of PEP 618:
> 
> https://www.python.org/dev/peps/pep-0618

The document says “… with nobody challenging the use of the word ‘strict’”, but 
people did challenge it, and even more people just called it “equal” instead of 
“strict” when arguing for it or +’ing it (which implies a preference even if 
there’s no argument there), and the only known prior art on this is 
more-itertools, which has a zip_equal function, not a zip_strict function.

I think it misrepresents the arguments for a separate function and undersells 
the advantages—it basically just addresses the objections that are easiest to 
reject. I don’t want to rehash all of my arguments and those of a dozen other 
people, since they’re already in the thread, but let me just give one: A 
separate function can be used in third-party libraries immediately, as long as 
there’s an available backport (whether that’s more-iterools, or a trivial zip39 
or whatever) that they can require; a flag can’t be used in libraries until 
they’re able to require Python 3.9 (unless they want to use a backport that 
monkey patches or shadows the builtin, but I doubt you’d suggest that, since 
you called it an antipattern elsewhere in the PEP).

It implies that infinite iterators are the only legitimate place where you’d 
ever want the existing shortest behavior.

Also, I don’t think anyone on the thread suggested the alternative of changing 
the behavior of zip _today_. Serhiy only suggested that we should leave the 
door open to doing so in the future, by having an enum-valued flag instead of a 
bool, or zip_shortest alongside zip_equal and zip_longest, or whatever. That 
allows people to explicitly say they want shortest when they want it, now—which 
might be beneficial even on its own terms. And if people end up usually using 
strict, and usually being explicit when they want shortest, then at that point 
it might be worth changing the default (or just not having one). So the 
argument against the alternative doesn’t really cover the actual thing 
suggested, but a different thing nobody wanted.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MHP3V2GFFBIDXVCY4T62TL4YRLGYGTGW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: is a

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 10:27, gbs--- via Python-ideas  
wrote:
> 
> In cases where it makes sense to do explicit type checking (with ABCs or 
> whatever), I really detest the look of the isinstance() function.
> 
> if isinstance(thing, Fruit) and not isinstance(thing, Apple):
> 
> Yucky.

I think it’s intentional that it’s a little yucky. It makes you think “could I 
be using duck typing or overridden methods here instead of type switching?” 
Sure, sometimes the answer is, “No, I can’t,” which is why ABCs were added. But 
if you’re using them so often that you get annoyed by the ugliness, then maybe 
you’re using an antipattern—or, if not, there’s a good chance you’re doing 
something that’s perfectly valid but unusual for Python, so the language just 
isn’t going to cater to you.

Maybe Python leans a little too far toward discouraging type checks, because 
there was so much resistance to the very idea of ABCs until people got used to 
them. But if so, I suspect you’ll need a solid example of realistic code that 
should look better, and can’t be reasonably redesigned, to convince people, not 
just showing that isinstance is about as ugly as it was designed to be.

> What I really want to write is:
> 
> if thing is a Fruit and thing is not an Apple:

> and after thinking about it on and off for a while I wonder if it might 
> indeed be possible to teach the parser to handle that in a way that 
> eliminates almost all possible ambiguity with the regular "is", including 
> perhaps 100% of all existing standard library code and almost all user code?

Possible? Yes, at least with the new parser coming from PEP 617. But that 
doesn’t mean it’s a good idea.

You certainly can’t make a and an into keywords, because lots of people have 
variables named a.

You can’t even make them into “conditional keywords”, that only have a special 
meaning after “is” and “is not”—besides all the usual negatives of conditional 
keywords, it won’t work, because “b is a” is already perfectly reasonable code 
today.

So you’d need to add some kind of backtracking: they’re conditional keywords 
only if they follow “is” or “is not” and are followed by a valid expression. 
Which is more complicated (and less efficient) to parse. Some third-party 
parser tools might even have to be completely rewritten, or at least to add 
special case hacks for this.

And, more importantly, the more context it takes to parse things (or the more 
special cases you have to learn and memorize), the harder the language’s syntax 
is to internalize. The fact that Python is (almost) an LL(1) language makes it 
pretty easy to get most of syntax for the subset that you use firmly into your 
head. Every special case makes that less true, which means more cases where you 
get confused by a SyntaxError in your code or about what someone else’s code 
means, and means it’s harder to manually work through the parse when you do get 
stumped like that and you resort to shotgun-debugging antics instead.

For a practical example, look at some languages that are actually designed to 
be executable English rather than executable pseudocode, like AppleScript or 
Inform. The fact that “bring every window of the first app to the foreground” 
reads like a normal English sentence is pretty cool, but the fact that “bring 
the first window of every app to the foreground” gives you an error message 
about not knowing what the every is, and the only way to rewrite it is “tell 
every app to bring the first window of it to the foreground”, severely dampens 
the coolness factor.

> Maybe this has been considered at some point in the past? The "is [not] a|an" 
> proposal would at least be a strong contender for "hardest thing to search 
> for on the internet" lol.

That will also make it hard to search for when you see some code you don’t 
understand and need to search for help, won’t it? A search for “isinstance” 
(even without including Python) brings me the docs page, some tutorials and 
blogs, and some StackOverflow questions; what’s a search for “is a” or even 
“Python is a” going to get me?

Maybe you could get more mileage out of going halfway there, with an operator 
named isa. Other languages use that spelling for related things (in Perl it’s 
exactly the operator you want; in ObjC it’s a property on the instance but it’s 
still about types), and people often use “isa” or “is-a” as a technical term in 
comp sci.

if thing isa Fruit and thing not isa Apple:

That’s still pretty readable, and easy to parse.

But it still breaks backward compatibility, because people do have code that 
uses “isa” as a normal identifier. (For one thing, it’s how you access the isa 
attribute of an ObjC object in PyObjC.)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at

[Python-ideas] Re: Introduce 100 more built-in exceptions

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 09:24, Christopher Barker  wrote:
> 
> Maybe it's too late for this, but I would love it if ".errno or similar" were 
> more standardized. As it is, every exception may have it's own way to find 
> out more about exactly what caused it, and often you are left with parsing 
> the message if you really want to know.

I don’t think there are many cases where a standardized .errno would help—and I 
think most such cases would be better served by separate exceptions. With 
OSError, errno was a problem to be fixed, not an ideal solution to emulate 
everywhere.

You do often need to be able to get more information, and that is a problem, 
but I think it usually needs to be specific to each exception, not something 
generic.

Does code often need to distinguish between an unpacking error and an int 
parsing error? If so, you should be able to handle UnpackingError and 
IntParsingError, not handle ValueError and check an .errno against some set of 
dozens of new builtin int constants. If not, then we shouldn’t change anything 
at all.

As for parsing the error message, that usually comes up because there’s 
auxiliary information that you need but that isn’t accessible. For example, in 
2.x, to get the filename that failed to open, you had to regex .args[0], and 
that sucked. But the fix was to add a .filename to all of the relevant 
exceptions, and now it’s great. If you need to be able to get the failing 
string for int(s) raising a ValueError today, you have to regex .args[0], and 
that sucks. Do people actually need to do that? If so, there should be a 
.string or something that carries that information; an .errno won’t help.

It seems like every year or two, someone suggests that we should go through the 
stdlib and fix all the exceptions to be reasonably distinguishable and to make 
their relevant information more accessible, and I don’t think anyone ever has a 
problem with that, it’s just that nobody’s ever willing to volunteer to survey 
every place a builtin or stdlib raises, list them all, and work out exactly 
what should be changed and where.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CZP5RDQGWAXS4QQ3BHVNRT4VBXVP2Z3Z/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-05-01 Thread Andrew Barnert via Python-ideas

On May 1, 2020, at 08:08, Christopher Barker  wrote:
> 
> Also please keep in mind that the members of this list, and the python-dev 
> list, are not representative of most Python users. Certainly not beginners 
> but also many (most?) fairly active, but more "casual" users.
> 
> Folks on this list are very invested in the itertools module and iteration in 
> general. But many folks write a LOT of code without every touching 
> iterttools. Honestly, a lot of it is pretty esoteric (zip_longests is not) -- 
> I need to read the docs and think carefully before I know what they even do. 

So what? Most of the os module is pretty esoteric, but that doesn’t stop you—or 
even a novice who just asked “how do I get my files like dir”—from using 
os.listdir. For that matter, zip is in the same place as stuff like setattr and 
memoryview, which are a lot harder to grok than chain.

That novice will never guess to look in os. And if I told them “go look in os”, 
that would be useless and cruel. But I don’t, I tell them “that’s called 
os.listdir”, and they don’t have to learn about effective/real/saved user ids 
or the 11 different spawn functions to “get my files like dir” like they asked.

> Example: Here's the docstring for itertools.chain:
> 
> chain(*iterables) --> chain object
> 
> Return a chain object whose .__next__() method returns elements from the
> first iterable until it is exhausted, then elements from the next
> iterable, until all of the iterables are exhausted.
> 
> I can tell you that I have no idea what that means -- maybe folks wth CS 
> training do, but that is NOT most people that use Python.

And here’s the docstring for zip:

> Return a zip object whose .__next__() method returns a tuple where
> the i-th element comes from the i-th iterable argument.  The .__next__()
> method continues until the shortest iterable in the argument sequence
> is exhausted and then it raises StopIteration

Most people have no idea what that means either.

In fact, chain is simpler to grok than zip (it just doesn’t come up as often, 
so it doesn’t need to be a builtin).

> Anyway, inscrutable docstrings are another issue, and one I keep hoping I'll 
> find the time to try to address one day,

Yes, many of Python’s docstrings tersely explain the details of how the 
function does what it does, rather than telling you why it’s useful or how to 
use it. And yes, that’s less than ideal.

But that isn’t an advantage to adding a flag to zip over adding a new function. 
Making zip more complicated certainly won’t magically fix its docstring, it’ll 
just make the docstring more complicated.

> but the point is :
> 
> "Folks will go look in itertools when zip() doesn't do what they want " just 
> does not apply to most people.

But nobody suggested that they will. That’s exactly why people keep saying it 
should be mentioned in the docstring and the docs page and maybe even the 
tutorial.

And you’re also right that it’s also not true that “folks will read the 
docstring for zip() when zip() doesn’t do what they want and figure it out from 
there”, but that’s equally a problem for both versions of the proposal.

In fact, most people, unless they learned it from a tutorial or class or book 
or blog post or from existing code before they needed it, are going to go to a 
coworker, StackOverflow, the TA for their class, a general web search, etc. to 
find out how to do what they want. There’s only so much Python can do about 
that—the docstring, docs page, and official tutorial (which isn’t the tutorial 
most people learn from) is about it.

We have to trust that if this really is something novices need, the people who 
teach classes and answer on StackOverflow and write tutorials and mentor 
interns and help out C# experts who only use Python twice a year and so on will 
teach it. There’s no way around that. But if those people can and do teach 
os.listdir and math.sin and so on, they can also teach zip_equal.

> Finally, yes, a pointer to itertools in the docstring would help a lot, but 
> yes, it's still a heavier lift than adding a flag, 'cause you have to then go 
> and import a new module, etc.

What’s the “etc.” here? What additional thing do they have to do besides import 
a new module?

People have to import a new module to get a list of their files. And lots of 
other things that are builtins in other languages. In JavaScript, I don’t have 
to import anything to decode JSON, to do basic math functions like sin or mean, 
to create a simple object (where I don’t have to worry about writing __init__ 
and __repr__ and __eq__ and so on), to make a basic web request, etc. In 
Python, I have to import a module to do any of those things (for the last one, 
I even have to install a third-party package first).

Namespaces are a honking great idea, but there is a cost to that idea, and that 
cost includes people having to learn import pretty early on.

___
Python-ideas mailing list --

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Andrew Barnert via Python-ideas

On Apr 29, 2020, at 22:50, Stephen J. Turnbull 
 wrote:
> Andrew Barnert via Python-ideas writes:
> 
>>> Also -1 on the flag.
> 
> Also -1 on the flag, for the same set of reasons.
> 
> I have to dissent somewhat from one of the complaints, though:
> 
>> auto-complete won’t help at all,

Thanks for pointing this out; I didn’t realize how misleadingly I stated this.

What I meant to say is that auto-complete won’t help at all with the problem 
that flags are less discoverable and harder to type than separate functions. 
Not that it won’t help at all with typing flags—it will actually help a little, 
it’ll just help a lot less than with separate functions, making the problem 
even starker rather than eliminating it.

It’s worth trying this out to see for yourself.

> Many (most?) people use IDEs that will catch up more or less quickly,
> though.  

In fact, most IDEs should just automatically work without needing to change 
anything, because they work off the signatures and/or typesheds in the first 
place. That’s not the issue; the issue is what they can actually do for you. 
And it’s not really any different from in your terminal.

In an iPython REPL in my terminal, I enter these definitions:

def spam(*args, equal=False): pass
def eggs(*args): pass
def eggs_equal(*args): pass

I can now type eggs_equal(x, y) with `e TAB TAB x, y` or `eggs_ TAB x, y`. And 
either way, a pop up is showing me exactly the options I want to see when I ask 
for completion, I’m not just typing that blind.

I can type spam(x, y, equal=True) with `s TAB x, y, e TAB T TAB`. That is 
better than typing out the whole thing, but notice that it requires three 
autocompletes rather than one, and they aren’t nearly as helpful. Why? Well, it 
has no idea that the third argument I want to pass is the equal keyword rather 
than anything at all, because *args takes anything all. And, even after it 
knows I’m passing the equal argument, it has no idea what value I want for it, 
so the only way to get suggestions for what to pass as the value is to type T 
and complete all values in scope starting with T (and usually True will be the 
first one). And it’s not giving me much useful information at each step; I had 
to know that I was looking to type equal=True before it could help me type 
that. The popup signature that shows *args, equal=False does clue me in, but 
still not nearly as well as offering eggs_equal did.

Now repeat the same thing in a source file in PyCharm, and it’s basically the 
same. Sure, the popups are nicer, and PyCharm actually infers that equal is of 
type bool even though I didn’t annotate so it can show me True, False, and all 
bool variables in scope instead of showing me everything in scope, but 
otherwise, no difference. I still need to ask for help three times instead of 
once, and get less guidance when I do.

And that’s with a bool (or Enum) flag. Change it to end="shortest", and it’s 
even worse. Strings aren’t code, they’re data, so PyCharm suggests nothing at 
all for the argument value, while iPython suggests generally-interesting 
strings like the files in my cwd. (I suppose they could add a special case for 
this argument of this function, although they don’t do that for anything else, 
not even the mode argument of open—and, even if they did, at best that makes 
things only a little worse than a bool or Enum instead of a lot worse…)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ERRWSIQC5XQBMOY3WX2NR5HH426LYX5L/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-30 Thread Andrew Barnert via Python-ideas

On Apr 30, 2020, at 07:58, Christopher Barker  wrote:
> 
>> I think that the issue of searchability and signature are pretty
>> compelling reasons for such a simple feature to be part of the
>> function name.
> 
> I would absolutely agree with that if all three function were in the same 
> namespace (like the string methods referred to earlier), but in this case, 
> one is a built in and the others will not be — which makes a huge difference 
> in discoverability.
> 
> Imagine someone that uses zip() in code that works for a while, and then 
> discovers a bug triggered by unequal length inputs.
> 
> If it’s a flag, they look at the zip docstring, and find the flag, and their 
> problem is solved.
> 
> Is it’s in itertools, they have to think to look there. Granted, some 
> googling will probably lead them there, and the zip() docstring can point 
> them there, but it’s still a heavier lift.

I don’t understand. You’re arguing that being discoverable in the docstring is 
sufficient for the flag, but being discoverable in the docstring is a heavier 
lift from the function. Why would this be true, unless you intentionally write 
the docstring badly?

To make this more concrete, let’s say we want to just add on to the existing 
doc string (even though it seems aimed more at reminding experts of the exact 
details than at teaching novices) and stick to the same style. We’re then 
talking about something like this:

> Return a zip object whose .__next__() method returns a tuple where
> the i-th element comes from the i-th iterable argument.  The .__next__()
> method continues until the shortest iterable in the argument sequence
> is exhausted and then it raises StopIteration, or, if equal is true,
> it checks that the remaining iterables are exhausted and otherwise
> raises ValueError. 

… vs. this:

> Return a zip object whose .__next__() method returns a tuple where
> the i-th element comes from the i-th iterable argument.  The .__next__()
> method continues until the shortest iterable in the argument sequence
> is exhausted and then it raises StopIteration. If you need to check
> that all iterables are exhausted, use itertools.zip_equal,
> which raises ValueError if they aren’t.

If they can figure out that equal=True is what they’re looking for from the 
first one, it’ll be just as easy to figure out that zip_equal is what they’re 
looking for from the second.

Of course it might be better to rewrite the whole thing to be more 
novice-friendly and to describe what zip iterates at a higher level instead of 
describing how its __next__ method operates, but that applies to both versions.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BGTNMWVD3THOYV2GILT7LNNYHMBGAW77/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding a "once" function to functools

2020-04-29 Thread Andrew Barnert via Python-ideas

On Apr 29, 2020, at 11:15, Tom Forbes  wrote:
> 
>> Thread 2 wakes up with the lock, calls the function, fills the cache, and 
>> releases the lock.
> 
> What exactly would the issue be with this:
> 
> ```
> import functools
> from threading import Lock
> 
> def once(func):
>sentinel = object()
>cache = sentinel
>lock = Lock()
> 
>@functools.wraps(func)
>def _wrapper():
>nonlocal cache, lock, sentinel
>if cache is sentinel:
>with lock:
>if cache is sentinel:
>cache = func()
>return cache
> 
>return _wrapper
> ```

You’ve written an exactly equIvalent to the double-checked locking for 
singletons examples that broke Java 1.4 and C++03 and led to us having once 
functions in the first place.

In both of those languages, and most others, there is no guarantee that the 
write to cache in thread 1 happens between the two reads from cache in thread 
2. Which gives you the fun kind of bug that every few thousand runs you have 
corrupted data an hour later, or it works fine on your computer but it crashes 
for one of your users because they have two CPUs that don’t share L2 cache 
while you have all your cores on the same die, or it works fine until you 
change some completely unrelated part of the code, etc.

Java solved this by adding volatile variables in Java 5 (existing code was 
still broken, but just mark cache volatile and it’s fixed); C++11 added a 
compiler-assisted call_once function (and added a memory model that allows them 
to specify exactly what happens and when so that the desired behavior was 
actually guaranteeable). Newer languages learned from their experience and got 
it right the first time, rather than repeating the same mistake.

Is there anything about Python’s memory model guarantee that means it can’t 
happen in Python? I don’t think there _is_ a memory model. In CPython, or any 
GIL-based implementation, I _think_ it’s safe (the other thread can’t be 
running at the same time on a different core, so there can’t be a cache 
coherency ordering issue between the cores, right?), but what about on Jython, 
or PyPy-STM, or a future GIL-less Python?

And in both of those languages, double-checked locking is still nowhere near as 
efficient as using a local static.

> Seems generally more correct, even in single threaded cases, to pay the 
> overhead only in the first call if you want `call_once` semantics. Which is 
> why you would be using `call_once` in the first place?

But you won’t be paying the overhead only on the first call, you’ll be paying 
it on all of the calls that before the first one completed. That’s the whole 
point of the lock, after all—they have to wait until it’s ready—and they can’t 
possibly do that without the lock overhead. And for the next few afterward, 
because they’ll have gotten far enough to check even if they haven’t gotten far 
enough to get the lock, and there’s no way they can know they don’t need the 
lock. And for the next few after that, because unless the system only runs one 
thread at a time and synchronizes all of memory every time you switch threads 
they may not see the write yet anyway.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G4ZDP6UYOL323VGX4IFRGGA5OVIEDD6P/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: deque: Allow efficient operations

2020-04-29 Thread Andrew Barnert via Python-ideas

On Apr 29, 2020, at 12:03, Christopher Barker  wrote:
> 
> 
> Isn't much demand for a *generic* linked list. It would probably be a good 
> recipe though -- so users could have a starting point for their custom 
> version.

I think what would be really handy would be a HOWTO on linked lists that showed 
the different options and tradeoffs and how to implement and use at least a few 
different ones, and showed why they’re useful with examples. (And also showed 
why the Sequence/Iterable API can be helpful but also why it’s not sufficient.)

Then the collections module (and the tutorial?) could both just have a sentence 
saying “Python doesn’t have a linked list type because there are so many useful 
kinds of linked lists and they’re all easy to build but very different—see the 
Linked Lists HOWTO for details.”

But if I wrote it, it would probably be 4x as long as any novice would want to 
read. (I think I wrote some blog posts on linked lists in Python years ago, and 
ended up building a Haskell-style lazy list out of a trigger function and then 
showing how to do Fibonacci numbers by recursively zipping it, or something 
crazy like that.)

In the old days we could probably just post three different simple recipes on 
ActiveState and link to them from the docs and let people build on the examples 
there, rather than try to write it all up-front and fit it into the Python docs 
style, but that doesn’t work so well anymore.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CTSBRBTMALAM6JFW6H4JPT2SAADK44A6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-29 Thread Andrew Barnert via Python-ideas

On Apr 29, 2020, at 07:08, Barry Scott  wrote:
> 
> 
>> On 28 Apr 2020, at 16:12, Rhodri James  wrote:
>> 
>>> On 28/04/2020 15:46, Brandt Bucher wrote:
>>> Thanks for weighing in, everybody.
>>> Over the course of the last week, it has become surprisingly clear that 
>>> this change is controversial enough to require a PEP.
>>> With that in mind, I've started drafting one summarizing the discussion 
>>> that took place here, and arguing for the addition of a boolean flag to the 
>>> `zip` constructor. Antoine Pitrou has agreed to sponsor, and I've chatted 
>>> with another core developer who shares my view that such a flag wouldn't 
>>> violate Python's existing design philosophies.
>>> I'll be watching this thread, and should have a draft posted to the list 
>>> for feedback this week.
>> 
>> -1 on the flag.  I'd be happy to have a separate zip_strict() (however you 
>> spell it), but behaviour switches just smell wrong.
> 
> Also -1 on the flag.
> 
> 1. A new name can be searched for.
> 2. You do not force a if on the flag for every single call to zip.

Agreed on both Rhodri’s and Barry’s reasons, and more below.

I also prefer the name zip_equal to zip_strict, because what we’re being strict 
about isn’t nearly as obvious as what’s different between shortest vs. equal 
vs. longest, but that’s just a mild preference, not a -1 like the flag.

In addition to the three points above:

Having one common zip variant spelled as a different function and the other as 
a flag seems really bad for learning and remembering the language. And 
zip_longest has a solidly established precedent. And I don’t think you want to 
add multiple bool flags to zip?

Also, just look at these:

zip_strict(xs, ys)
zip(xs, ys, strict=True)

The first one is easier to read because it doesn’t have the extra 5 characters 
to skim over that don’t really add anything to the meaning, and it puts the 
important distinction up front.

It’s also shorter, and a lot easier to type with auto-complete—which isn’t 
nearly as big of a deal, but if this is really meant to be used often it does 
add up.

And it’s obviously more extensible, if it really is at all possible that we 
might want to eventually deprecate shortest or add new end behaviors like 
yielding partial tuples or Soni’s thing of stashing the leftovers somehow (none 
of which I find very convincing, but others apparently do, and picking a design 
that rules them out means explicitly rejecting them).

A string or enum flag instead of a book solves half of those problems (as long 
as “longest” is one of the options), but it makes others even worse. The 
available strings aren’t even discoverable as part of the signature, 
auto-complete won’t help at all, and the result is even longer and even more 
deemphasizes the important thing.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3JKAI25VFIGBO4HPWQ6S22PNKZ6ZOCCT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: deque: Allow efficient operations

2020-04-29 Thread Andrew Barnert via Python-ideas

On Apr 29, 2020, at 08:33, Christopher Barker  wrote:
> 
> I've wondered about Linked Lists for a while, but while there are many 
> versions on PyPi, I can't find one that seems to be mature and maintained. 
> Which seems to indicate that there isn't much demand for them.

I think there’s lots of demand for them, but there are so many different 
variants that can’t substitute for each other (try taking any nontrivial sample 
code using Haskell’s single-linked, no-handle, immutable tail-sharing list and 
rewriting it with C++’s doubly-linked handled mutable list, or vice-versa), and 
most of the key operations fit so poorly with Python’s sequence/iterable API, 
and they’re all so easy to build, that people just build the one they need 
whenever they need it.

I do have a few different linked lists in my toolbox that have come up often 
enough that I stashed them (an immutable cons, a handled double-linked list, a 
cffi wrapper for a common style of C internally-linked lists, probably others), 
but half the time I reach for one I have to modify it anyway, so I haven’t 
bothered to turn them into a package I just import and use.

And, while I did add the whole (Mutable)Sequence API to each one (because it’s 
convenient for debugging and REPL exploration to be able to list(xs), or to get 
a repr that’s written in terms of a from_iter classmethod so I can eval it 
back, etc.), I usually don’t use that API for anything but debugging. When 
you’re dealing with linked lists, you usually need to deal with the nodes 
directly. For example, one big reason to use linked lists is constant-time 
splicing, but you can’t splice in constant time if all you have is the 
head/handle and/or an opaque iterator that only knows how to go forward; you 
need the node before the splice point (or, for doubly-linked, after is fine 
too). Another reason to use (Lisp/Haskell-style) linked lists is that they 
automatically release nodes as you iterate unless you keep a reference to the 
head, but that’s clumsy to do with Python-style APIs. And so on.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DGBRHXGXHAMERZRQW2WN5XMU22WBIHUK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding a "once" function to functools

2020-04-29 Thread Andrew Barnert via Python-ideas

On Apr 28, 2020, at 16:25, Steven D'Aprano  wrote:
> 
> On Tue, Apr 28, 2020 at 11:45:49AM -0700, Raymond Hettinger wrote:
> 
>> It seems like you would get just about everything you want with one line:
>> 
>> once = lru_cache(maxsize=None)
> 
> But is it thread-safe?

You can add thread safety the same way as any other function:

@synchronized
@once
def spam():
return 42 in a slow and non-thread-safe and non-idempotent way and also 
launch the missiles the second time we’re called

Or wrap a with lock: around the code that calls it, or whatever.

Not all uses of once require thread safety. For the really obvious example, 
imagine you’re sharing a singleton between coroutines instead of threads. And 
if people are really concerned with the overhead of lru_cache(maxsize=None), 
the overhead of locking every time you access the value is probably even less 
acceptable when unnecessary.

So, I think it makes sense to leave it up to the user (but to explain the issue 
in the docs). Or maybe we could add a threading.once (and asyncio.once?) as 
well as functools.once?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RSJLUF4R6TM3HSILZYWGB366WUHQT755/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding a "once" function to functools

2020-04-28 Thread Andrew Barnert via Python-ideas

On Apr 28, 2020, at 12:02, Alex Hall  wrote:
> 
> Some libraries implement a 'lazy object' which forwards all operations to a 
> wrapped object, which gets lazily initialised once:
> 
> https://github.com/ionelmc/python-lazy-object-proxy
> https://docs.djangoproject.com/en/3.0/_modules/django/utils/functional/
> 
> There's also a more general concept of proxying everything to some target. 
> wrapt provides ObjectProxy which is the simplest case, the idea being that 
> you override specific operations:
> 
> https://wrapt.readthedocs.io/en/latest/wrappers.html
> 
> Flask and werkzeug provide proxies which forward based on the request being 
> handled, e.g. which thread or greenlet you're in, which allows magic like the 
> global request object:
> 
> https://flask.palletsprojects.com/en/1.1.x/api/#flask.request
> 
> All of these have messy looking implementations and hairy edge cases. I 
> imagine the language could be changed to make this kind of thing easier, more 
> robust, and more performant. But I'm struggling to formulate what exactly 
> "this kind of thing is", i.e. what feature the language could use.

For the case where you’re trying to do the “singleton pattern” for a complex 
object whose behavior is all about calling specific methods, a proxy might 
work, and the only thing Python might need, if anything, is ways to make it 
possible/easier to write a GenericProxy that just delegates everything in some 
clean way, but even that isn’t really needed if you’re willing to make the 
proxy specific to the type you’re singleton-ing.

But often what you want to lazily initialize is a simple object—a str, a small 
integer, a list of str, etc.

Guido’s example lazily initialized by calling getcwd(), and the first example 
given for the Swift feature is usually a fullname string built on demand from 
firstname and lastname. And if you look for examples of @cachedproperty (which 
really is exactly what you want for @lazy except that it only works for 
instance attributes, and you want it for class attributes or globals), the 
singleton pattern seems to be a notable exception, not the usual case; mostly 
you lazily initialize either simple objects like a str, a pair of floats, a 
list of int, etc., or numpy/pandas objects.

And you can’t proxy either of those in Python.

Especially str. Proxies work by duck-typing as the target, but you can’t 
duck-type as a str, because most builtin and extension functions that want a 
str ignore its methods and use the PyUnicode API to get directly at its array 
of characters. Numbers, lists, numpy arrays, etc. aren’t quite as bad as str, 
but they still have problems.

Also, even when it works, the performance cost of a proxy would often be 
prohibitive. If you write this:

@lazy
def fullname():
return firstname + " " + lastname

… presumably it’s because you need to eliminate the cost of string 
concatenation every time you need the fullname. But if it then requires every 
operation on that fullname to go through a dynamic proxy, you’ve probably added 
more overhead than you saved.

So I don’t think proxies are the answer here.

Really, we either need descriptors that can somehow work for globals and class 
attributes (which is probably not solveable), or some brand new language 
semantics that aren’t built on what’s already there. The latter sounds like 
probably way more work than this feature deserves, but maybe the experience of 
Swift argues otherwise.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OGCFXBYXPT7AVJQLSW3HTNBP7SJJ7A5B/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()

2020-04-28 Thread Andrew Barnert via Python-ideas

On Apr 28, 2020, at 09:18, Chris Angelico  wrote:
> 
> I suggest forking CPython and implementing the feature.

I’d suggest trying MacroPy first. There’s no way to get the desired syntax with 
macros, but at least at first glance it seems like you should be able to get 
the desired semantics with something that’s only kind of ugly and clumsy, 
rather than totally hideous. And if so, that’s usually good enough for playing 
around with fun ideas to see where they can lead, and a lot less work.

Plus, playing with MacroPy is actually fun in itself; playing with the CPython 
parser is kind of the opposite of fun. :)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SPTNT7DXUHVP32ZUZAC6HIYYWNEYDY4K/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding a "once" function to functools

2020-04-28 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 10:41, Guido van Rossum  wrote:
> 
> 
> Since the function has no parameters and is pre-computed, why force all users 
> to *call* it? The @once decorator could just return the value of calling the 
> function:
> 
> def once(func):
> return func()
> 
> @once
> def pwd():
> return os.getcwd()

If that’s all @once does, you don’t need it. Surely this is even clearer:

pwd = os.getcwd()

The decorator has to add initialization on first demand, or it’s not doing 
anything.

But I think you’re onto something important that everyone else is missing. To 
the user of this module, this really should look like a variable, not a 
function. The fact that we want to initialize it later shouldn’t change that. 
Especially not in Python—other languages bend over backward to make you write 
getters around every public attribute even when you don’t need any computation; 
Python bends over backward to let you expose public attributes even when you do 
need computation.

And this isn’t unprecedented. Swift added a lazy variable initialization 
feature in version 2 even though they already had dispatch_once. And then they 
discovered that it eliminated nearly all good uses of dispatch_once and 
deprecated it. All you need is lazy-initialized variables. Your singletons, 
your possibly-unused expensive tables, your fiddly low-level things with 
complicated initialization order dependencies, they’re all lazy variables. So 
what’s left for @once functions that need to look like functions?

I think once you think about it in these terms, @lazy makes more sense than 
@once. The difference between these special attributes and normal ones is that 
they’re initialized on first demand rather than at definition time. The “on 
demand” is the salient bit, not the “first”.

The only problem is: how could this be implemented? Most of the time you want 
these things on modules. For lazy imports, the __getattr__ solution of PEP 562 
was good enough, but this isn’t nearly as much of an expert feature. Novices 
write lazy variables in Swift, and if we have to tell them they can’t do it in 
Python without learning the deep magic of how variable lookup works, that would 
be a major shame. But I can’t think of an answer that doesn’t run into all the 
same problems that PEP 562’s competing protocols did.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GHDFHZ46QAIMKRGACESWG6XX4FPPLZIN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 27, 2020, at 20:48, Soni L.  wrote
> 
>> Here are four ways of doing this today:

…

>> So, why do we need another way to do something that’s probably pretty 
>> uncommon and can already be done pretty easily? Especially if that new way 
>> isn’t more readable or more powerful?
> 
> the only one with equivalent semantics is the last one.

I won’t argue about whether two functions that give the exact same results in 
every case but get there in different ways are “equivalent” or not, since one 
is already good enough.

If you agree that there is obvious code that works in Python 3.8 (and even in 
Python 2.7, for that matter) to get the semantics you want, why should we add a 
new language feature that gives you a less readable, more verbose, and more 
complicated way to do the same thing?

> tbh my particular case doesn't make a ton of practical sense.

That’s hardly a good argument for your proposal. Do you actually want the 
things you propose to be added to the language, or even to be seriously 
considered? If not, why are you proposing them?

>> > > see: why are we perfectly happy with ignoring extra lines at the end?
>> 
>> Because there aren’t any. The file was made by catting together 2022 4-line 
>> files, so it’s 8088 lines long. It will always be 8088 lines long. If I 
>> really thought that was important to check, surely I’d want to check 8088 
>> rather than just divisible by 4. But I didn’t think it was worth checking 
>> either of those—or that the text is pure ASCII, or that the newlines are \n, 
>> etc. For a more general purpose script (especially if it had to accept input 
>> from potentially stupid or malicious end users and produce useful error 
>> responses instead of just punting), I would have checked many of those 
>> things and more, but for this script, it wasn’t worth it.
> 
> that's what assert is for - making assumptions that you know are correct now, 
> but might not remain so in the future!

Would you want to read, or maintain, code like this:

s = "spam"
assert isinstance(s, str)
assert isinstance(type(s), type)
assert len(s) == 4
assert len(set(s)) == len(s)
for c in s:
assert type(c) == type(s)
assert c is not None
assert len(c) == 1
assert s.count(c) == 1
assert 0 <= ord(c) < 0x11
assert len(c.encode()) <= 4
assert not sys.stdout.closed()
print(f"{c}...")
if sys.implementation.name == "cpython”:
assert chr(ord(c)) is c
assert c == s[-1]
assert s == "spam"

I’m assuming all of those things are true, and hundreds more (from the fact 
that s was unbound before the assignment to the fact that nobody has modified 
the interned 0 value to mean 1), but that doesn’t mean they’re all worth 
testing. Trying to test absolutely everything just means you’re more likely to 
forget to test one of the important things, and more likely to miss it if you 
do forget. (And that’s even assuming all of your tests are correct, which they 
almost certainly won’t be if you’re trying to test everything you can imagine. 
So you’ll also waste time debugging useless tests that could have been spent 
verifying, debugging or improving the useful tests and/or the actual 
functionality.)

On top of that, if my input file doesn’t have 8088
lines, that’s almost certainly not a bug in my code, but either user error (I 
put the wrong file at that path) or corrupted data (I accidentally truncated 
the file). So testing it with an assert would actually be misleading myself; it 
should be something like a ValueError. Even if you never programmatically 
handle the error, having the right error makes a big difference to ease of 
debugging.

>> Even if you think Python should be doing more to encourage such checks, your 
>> proposal doesn’t help that at all—what you want is something like Serhiy’s 
>> proposal in the other thread (to eventually rename zip to zip_shortest and 
>> either get rid of plain zip or make it an alias for zip_equal).
> 
> ... why not? I know assert is discouraged by many, but I wouldn't say 
> enabling ppl to do these checks doesn't help ppl do these checks...? unless I 
> misunderstand what you mean by this?

Because people already are enabled to check, and they’re just choosing not to. 
Giving them a harder and less discoverable way isn’t going to change that. 
Anyone who’s decided it’s not worth using zip_equal instead of zip is not going 
to think it’s worth adding an else and a test to the loop around that zip.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DNSURDEN67HDRYBBWV7FONVITNOKEC3J/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 27, 2020, at 17:01, Soni L.  wrote:
> 
>>> On 2020-04-27 8:37 p.m., Andrew Barnert wrote:
>>> On Apr 27, 2020, at 14:38, Soni L.  wrote:
>> [snipping a long unanswered reply]
>>> The explicit case for zip is if you *don't* want it to consume anything 
>>> after the stop.
>> Sure, but *when do you want that*? What’s an example of code you want to 
>> write that would be more readable, or easier to write, or whatever, if you 
>> could work around consuming anything after the stop?
> 
> so here's one example, let's say you want to iterate multiple things (like 
> with zip), get a count out of it, as well as partially consume an external 
> iterator without swallowing any extra values from it.

What do you want to do that for? This still isn’t a concrete use case, so it’s 
still not much more of a rationale than “let’s say you want to intermingle the 
bits of two 16-bit integers into a 32-bit integer”. Sure, that’s something 
that’s easy to do in some other languages (it’s the builtin $ operator in 
INTERCAL) but very hard to do readably or efficiently in Python. If we added a 
$ operator with a __bigmoney__ protocol and made int.__bigmoney__ implement 
this operation in C, that would definitely solve the problem. But it’s only 
worth proposing that solution if anyone actually needs a solution to the 
problem in the first place. When’s the last time anyone ever needed to 
efficiently intermingle bits? (Except in INTERCAL, where the language 
intentionally leaves out useful operators like +, |, and << and even 32-bit 
literals to force you to write things in clever ways around $ and ~ instead).

On top of that, this abstract example you want can already be written today.

> it'd look something like this:
> 
>def foo(self, other_things):
>  for x in zip(range(sys.maxsize), self.my_things, other_things):
>do_stuff
>  else as y:
>return y[0] # count

> using extended for-else + partial-zip. it stops as soon as self.my_things 
> stops. and then the caller can do whatever else it needs with other_things. 
> (altho maybe it's considered unpythonic to reuse iterators like this? I like 
> it tho.)

Here are four ways of doing this today:

   def foo(self, other_things):
   for x in zip(count(1), self.my_things, other_things):
   do_stuff
   return x[0]

   def foo(self, other_things):
   c = count(-1)
   for x in zip(c, self.my_things, other_things):
   do_stuff
   return next(c)

   def foo(self, other_things):
   c = count()
   for x in zip(self.my_things, other_things, c):
   do_stuff
   return next(c)

   def foo(self, other_things):
   c = lastable(count())
   for x in zip(c, self.my_things, other_things):
   do_stuff
   return c.last

So, why do we need another way to do something that’s probably pretty uncommon 
and can already be done pretty easily? Especially if that new way isn’t more 
readable or more powerful?

> if anything my motivating example is because I wanna do some very unpythonic 
> things.

Then you should have given that example in the first place.

Sure, the fact that it’s unpythonic might mean it’s not very convincing, but it 
doesn’t become more convincing after multiple people have to go back and forth 
to drag it out of you. All that means is that everyone else has already tuned 
out and won’t even see your example, so your proposal has basically zero chance 
instead of whatever chance it should have had.

And sometimes unpythonic things really do get into the language—sometimes 
because they’re just so useful, but more often, because they point to a reason 
for changing what everyone’s definition of “pythonic” is. Think of the abc 
module. Or, better, if you can dig up the 3.1-era vs. 3.3-era threads on the 
original coroutine PEP 3152, you can see how the consensus changed from “wtf, 
that doesn’t look like Python at all and nobody will ever understand it” to 
“this is obviously the pythonic way to write reactors (modulo a bunch of 
bikeshedding)”. That wouldn’t have happened if Greg Ewing had refused to tell 
anyone that he wanted coroutines to provide a better, if unfamiliar, way to 
write things like reactors, and instead tried to come up with 
less-unpythonic-looking but completely useless examples.

>> That grouping idiom is useful for all kinds of things that _aren’t_ about 
>> optimization. Maybe the zip docs aren’t the best place for it (but it’s also 
>> in the itertools recipes, which probably is the best place for it), but it’s 
>> definitely useful. In fact, I used it less than a week ago. We’ve got this 
>> tool that writes a bunch of 4-line files, and someone concatenated a bunch 
>> of them together and wrote this horrible code to pull them back apart in 
>> another language I won’t mention here, and rather than debug their code, I 
>> just rewrote it in Python like this:
>>   with open(path) as f:
>>   for entry in chunkify(f, 4):
>>   process(entry)

[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 27, 2020, at 16:35, Soni L.  wrote:
> 
> the point of posting here is that someone else may have a similar existing 
> use-case

Similar to *what*? It can’t be similar to your use case if you don’t have a use 
case for it to be similar to.

If you really can’t imagine why something might be useful, and nobody else has 
ever asked for it, it probably isn’t actually needed. Sure, there are rare 
exceptions to that, but that shouldn’t be your default assumption for 
everything that could ever conceivably be done.

> where this would make things better. I can't take a look at proprietary code 
> so I post about stuff in the hopes that the ppl who can will back this stuff 
> up.
> 
> (doesn't proprietary software make things so much harder? :/)

A little bit, but not nearly as much as you seem to be thinking. There are 
zillions of lines of open source Python code easily searchable. There may be a 
few kinds of problems that are likely to only come up in proprietary code, but 
something generic like this is just as likely to be useful to Django or 
MusicBrainz or Jupyter or DNF or even the Python stdlib as to some internal 
Dropbox service or the guts of the Civ V scripting engine. So the fact that you 
can’t search the Dropbox or Firaxis source is not actually a big problem.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZF2KSSWXGEY4C6HFBBS64XV3BA2HUGX7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 27, 2020, at 14:38, Soni L.  wrote:

[snipping a long unanswered reply]

> The explicit case for zip is if you *don't* want it to consume anything after 
> the stop.

Sure, but *when do you want that*? What’s an example of code you want to write 
that would be more readable, or easier to write, or whatever, if you could work 
around consuming anything after the stop?

> btw: I suggest reading the whole post as one rather than trying to pick it 
> apart.

I did read the whole post, and then went back to reply to each part in-line. 
You can tell by the fact that I refer to things later in the post. For example, 
when I refer to your proposed code being better than “the ugly mess that you 
posted below“ as the current alternative, it should be pretty clear that I’ve 
already read the ugly mess that you posted below.

So why did I format it as replies inline? Because that’s standard netiquette 
that goes back to the earliest days of email lists. Most people find it 
confusing (and sometimes annoying) to read a giant quote and then a giant reply 
and try to figure out what’s being referred to where, so when you have a giant 
message to reply to, it’s helpful to reply inline.

But as a bonus, writing a reply that way makes it clear to yourself if you’ve 
left out anything important. You didn’t reply to multiple issues that I raised, 
and I doubt that it’s because you don’t have any answers and are just trying to 
hide that fact to trick people into accepting your proposal anyway, but rather 
than you just forgot to get to some things because it’s easy to miss important 
stuff when you’re not replying inline.

> the purpose of the proposal, as a whole, is to make it easier to pick things 
> - generators in particular - apart. I tried to make that clear but clearly I 
> failed.

No, you did make that part clear; what you didn’t make clear is (a) what 
exactly you’re trying to pick apart from the generators and why, (b) what 
actual problems look like, (c) how your proposal could make that code better, 
and (d) why existing solutions (like manually nexting iterators in a while 
loop, or using tools like peekable) don’t already solve the problem.

Without any of that, all you’re doing is offering something abstract that might 
conceivably be useful, but it’s not clear where or why or even whether it would 
ever come up, so for all we know it’ll *never* actually be useful. Nobody’s 
likely to get on board with such a change.

> Side note, here's one case where it'd be better than using zip_longest:

Your motivating example should not be a “side note”, it should be the core of 
any proposal.

But it should also be a real example, not a meaningless toy example. Especially 
not one where even you can’t imagine an actual similar use case. “We should add 
this feature because it would let you write code that I can’t imagine ever 
wanting to write” isn’t a rationale that’s going to attract much support.

> for a, b, c, d, e, f, g in zip(*[iter(x)]*7): # this pattern is suggested by 
> the zip() docs, btw.
>use_7x_algorithm(a, b, c, d, e, f, g)
> else as x: # leftovers that didn't fit the 7-tuple.
>use_slow_variable_arity_algorithm(*x)

Why do you want to unpack into 7 variables with meaningless names just to pass 
those 7 variables? And if you don’t need that part, why can’t you just write 
this with zip_skip (which, as mentioned in the other thread, is pretty easy to 
write around zip_longest)?

The best guess I can come up with is that in a real life example maybe that 
would have some performance cost that’s hard to see in this toy. But then if 
that’s the case, given that x is clearly not an iterator, is it a sequence? You 
could then presumably get much more optimization by looping over slices instead 
of using the grouper idiom in the first place. Or, as you say, by using numpy.

> I haven't found a real use-case for this yet, tho.
> SIMD is handled by numpy, which does a better job than you could ever hope 
> for in plain python, and for SIMD you could use zip_longest with a suitable 
> dummy instead. but... yeah, not really useful.

> (actually: why do the docs for zip() even suggest this stuff anyway? seems 
> like something nobody would actually use.)

That grouping idiom is useful for all kinds of things that _aren’t_ about 
optimization. Maybe the zip docs aren’t the best place for it (but it’s also in 
the itertools recipes, which probably is the best place for it), but it’s 
definitely useful. In fact, I used it less than a week ago. We’ve got this tool 
that writes a bunch of 4-line files, and someone concatenated a bunch of them 
together and wrote this horrible code to pull them back apart in another 
language I won’t mention here, and rather than debug their code, I just rewrote 
it in Python like this:

   with open(path) as f:
   for entry in chunkify(f, 4):
   process(entry)

I used a function called chunkify because I think that’s a lot easier to 
understand (especially for

[Python-ideas] Smarter zip, map, etc. iterables (Re: Re: zip(x, y, z, strict=True))

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 27, 2020, at 13:41, Christopher Barker  wrote:
> 
> SIDE NOTE: this is reminding me that there have been calls in the past for an 
> optional __len__ protocol for iterators that are not proper sequences, but DO 
> know their length -- maybe one more place to use that if it existed.

But __len__ doesn’t really make sense on iterators.

And no iterator is a proper sequence, so I think you meant _iterables_ that 
aren’t proper sequences anyway—and that’s already there:

xs = {1, 2, 3}
len(xs) # 3
isinstance(xs, collections.abc.Sized) # True

I think the issue is that people don’t actually want zip to be an Iterator, 
they want it to be a smarter Iterable that preserves (at least) Sized from its 
inputs. The same way, e.g., dict.items or memoryview does. The same way range 
is lazy but not an Iterator.

And it’s not just zip; the same thing is true for map, enumerate, islice, etc.

And it’s also not just Sized. It would be just as cool if zip, enumerate, etc. 
preserved Reversible. In fact, “how do I both enumerate and reverse” comes up 
often enough that I’ve got a reverse_enumerate function in my toolbox to work 
around it. And, for that matter, why do they have to be only one-shot-iterable 
unless their input is? Again, dict.items and range come to mind, and there’s no 
real reason zip,
map, islice, etc. couldn’t preserve as much of their input behavior as possible:

xs = [1, 2, 3]
ys = map(lambda x: x*3, xs)
len(ys) # 3
reversed(enumerate(ys))[-1] # (0, 3)

Of course it’s not always possible to preserve all behavior:

xs = [1, 2, 3]
ys = filter(lambda x: x%2, xs)
len(ys) # still a TypeError even though xs is sized

… but the cases where it is or isn’t possible can all be worked out for each 
function and each ABC: filter can _never_ preserve Sized but can _always_ 
preserves Reversible, etc.

This is clearly feasible—Swift does it, and C++ is trying to do it in their 
next version, and Python already does it in a few special cases (as mentioned 
earlier), just not in all (or even most) of the potentially useful cases.

The only really hard part of this is designing a framework that makes it 
possible to write all those views simply. You don’t want to have to write five 
different map view classes for all the ways a map can act based on its inputs, 
and then repeat 80% of that same work again for filter, and again for islice 
and so on. The boilerplate would be insane. (See the Swift 1.0 stdlib for an 
example of how horrible it could be, and they only implemented a handful of the 
possibilities.)

And, except for a couple of things (notably genexprs), most of this could be 
written as a third-party library today. (And if it existed and people were 
using it widely, it would be pretty easy to argue that it should come with 
Python, so that it _could_ handle those last few things like genexprs, and also 
to serve as an example to encourage third-party libraries like toolz to 
similarly implement smart views instead of dumb iterators, and also as helpers 
to make that easier for them. That argument might or might not win the day, but 
at least it’s obvious what it would look like.)

So I suspect the only reason nobody’s done so is that you don’t actually run 
into a need for it very often.

How often do you actually need the result of zip to be Sized anyway?

At least for me, it’s not very often. Whenever I run into any of these needs, I 
start thinking about the fully general solution, but put it off until I run 
into a second good use for it and meanwhile write a simple 2-minute workaround 
for my immediate use (or add a new special case like reversed_enumerate to my 
toolbox), and then by the time I run into another need for it, it’s been so 
long that I’ve almost forgotten the idea…

But maybe there would be a lot more demand for this if people knew the idea was 
feasible? Maybe there are people who have tons of real-life examples where they 
could use a Sized zip or a Reversible enumerate or a Sequence map, and they 
just never thought they could have it so they never tried or asked?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3DAO3ZS7ZF4TIOKJBJK3XANTKNQ6DOKG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: extended for-else, extended continue, and a rant about zip()

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 27, 2020, at 12:49, Soni L.  wrote:
> 
> I wanna propose making generators even weirder!

Why? Most people would consider that a negative, not a positive. Even if you 
demonstrate some useful functionality with realistic examples that benefit from 
it, all you’ve done here is set the bar higher for yourself to convince anyone 
that your change is worth it.

> so, extended continue is an oldie: 
> https://www.python.org/dev/peps/pep-0342/#the-extended-continue-statement
> 
> it'd allow one to turn:
> 
> yield from foo
> 
> into:
> 
> for bar in foo:
> continue (yield bar)

And what’s the advantage of that? It’s a lot more verbose, harder to read, 
probably easier to get wrong, and presumably less efficient. If this is your 
best argument for why we should revisit an old rejected idea, it’s not a very 
good one.

(If you’re accepting that it’s a pointless feature on its own but proposing it 
because, together with your other proposed new feature, it would no longer be 
pointless, then say that, don’t offer an obviously bad argument for it on its 
own.)

> but what's this extended for-else? well, currently you have for-else:
> 
> for x, y, z in zip(a, b, c):
> ...
> else:
> pass
> 
> and this works. you get the stuff from the iterators, and if you break the 
> loop, the else doesn't run. the else basically behaves like "except 
> StopIteration:"...
> 
> so I propose an extended for-else, that behaves like "except StopIteration as 
> foo:". that is, assuming we could get a zip() that returns partial results in 
> the StopIteration (see other threads), we could do:
> 
> for x, y, z in zip(a, b, c):
> do_stuff_with(x, y, z)
> else as partial_xy:
> if len(partial_xy) == 0:
> x = dummy
> try:
> y = next(b)
> except StopIteration: y = dummy
> try:
> z = next(c)
> except StopIteration: z = dummy
> if (x, y, z) != (dummy, dummy dummy):
> do_stuff_with(x, y, z)
> if len(partial_xy) == 1:
> x, = partial_xy
> y = dummy
> try:
> z = next(c)
> except StopIteration: z = dummy
> do_stuff_with(x, y, z)
> if len(partial_xy) == 2:
> x, y = partial_xy
> z = dummy
> do_stuff_with(x, y, z)
> 
> (this example is better served by zip_longest. however, it's nevertheless a 
> good way to demonstrate functionality, thanks to zip_longest's (and zip's) 
> trivial/easy to understand behaviour.)

Would it always be this complicated and verbose to use this feature? I mean, 
compare it to the “roughly equivalent” zip_longest in the docs, which is a lot 
shorter, easier to understand, harder to get wrong, and more flexible (e.g., it 
works unchanged with any number of iterables, while yours to had to rewritten 
for any different number of iterables because it requires N! chunks of explicit 
boilerplate).

Are there any examples where it lets you do something useful that can’t be done 
with existing features, so it’s actually worth learning this weird new feature 
and requiring Python 3.10+ and writing 22 lines of extra code?

Even if there is such an example, if the code to deal with the post-for state 
is 11x as long and complicated as the for loop and can’t be easily simplified 
or abstracted, is the benefit of using a for loop instead of manually nexting 
iterators still a net benefit? I don’t know that manually nexting the iterators 
will always avoid the problem, but it certainly often is (again, look at many 
of the equivalents in the itertools docs that do it), and it definitely is in 
your emulating-zip_longest example, and that’s the only example you’ve offered.

Also notice that many cases like this can be trivially solved by a simple 
peekable or unnextable (I believe more-itertools has both, and the first one is 
a recipe in itertools too, but I can’t remember the names they use; if not, 
they’re really easy to write) or tee. We don’t even need any of that for your 
example, but if you can actually come up with another example, make sure it 
isn’t already doable a lot more simply with peekable/etc.

> this would enable one to turn:
> 
> return yield from foo
> 
> into:
> 
> for bar in foo:
> continue (yield bar)
> else as baz:
> return baz
> 
> allowing one to pick apart and modify the yielded and sent parts, while still 
> getting access to the return values.

Again, this is letting you turn something simple into something more 
complicated, and it’s not at all clear why you want to do that. What exactly 
are you trying to pick apart that makes that necessary, that can’t be written 
better today?

I’ll grant that writing something fully general that supports all the different 
things that could be theoretically done with your desired feature requires the 
ugly mess that you posted

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 21:23, David Mertz  wrote:
> 
> 
>> On Sun, Apr 26, 2020 at 11:56 PM Christopher Barker  
>> wrote:
>> > If I have two or more "sequences" there are basically two cases of that.
>> 
>> so you need to write different code, depending on which case? that seems not 
>> very "there's only one way to do it" to me.
> 
> This difference is built into the problem itself.  There CANNOT be only one 
> way to do these fundamentally different things.
> 
> With iterators, there is at heart a difference between "sequences that one 
> can (reasonably) concretize" and "sequences that must be lazy."  And that 
> difference means that for some versions of a seemingly similar problem it is 
> possible to ask len() before looping through them while for others that is 
> not possible (and hence we may have done some work that we want to 
> "roll-back" in some sense).

Agreed. But here’s a different way to look at it:

The Python iteration protocol hides the difference between different kinds of 
iterables; every iterator is just a dumb next-only iterator. So any distinction 
between things you can pre-check and things you can post-check has to be made 
at a higher level, up wherever the code knows what’s being iterated (probably 
the application level). That isn’t inherent to the idea of iteration, as 
demonstrated by C++ (and later languages like Swift), where you can have 
reversible or random-accessible iterators and write tools that switch on those 
features, so you wouldn’t be forced to make the decision at the application 
level. You could write a generic C++ zip_equal function that pre-checks 
random-accessible iterators but post-checks other iterators.

But when would you want that generic function? When you’re writing that 
application code, you know whether you have sequences, inherently lazy 
iterators, or generic iterables as input, and you know whether you want no 
check, a pre-check, or a post-check on equal lengths, and those aren’t 
independent questions: when you want a pre-check, it’s because you’re thinking 
in sequence terms, not general iteration terms.

Pre-checking sequences is so trivial that you don’t need any helpers. The only 
piece Python is (arguably) missing is a way to do that post-check easily when 
you’ve decided you need it, and that’s what the proposals in this thread are 
trying to solve.

The fact that asking for post-checking on the zip iterator won’t look the same 
as manually pre-checking the input sequences isn’t a violation of TOOWTDI 
because the “it” you’re doing is a different thing, different in a way that’s 
meaningful to your code, and there doesn’t have to be one obvious way to do two 
different things. Just like slicing doesn’t have to look the same as islice, 
and a find method doesn’t have to look the same as a generic iterable find 
function, and so on; they only look the same when the distinction between 
thinking about sequences and thinking about lazy iterables is irrelevant to the 
problem.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XXMKTQFT5JJGZS2QNFFT5JUCXLN3GV6J/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-27 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 16:58, Steven D'Aprano  wrote:
> 
> On Sun, Apr 26, 2020 at 04:13:27PM -0700, Andrew Barnert via Python-ideas 
> wrote:
> 
>> But if we add methods on zip objects, and then we add a new skip() 
>> method in 3.10, how does the backport work? It can’t monkeypatch the 
>> zip type (unless we both make the type public and specifically design 
>> it to be monkeypatchable, which C builtins usually aren’t).
> 
> Depends on how you define monkey-patching.
> 
> I'm not saying this because I see the need for a plethora of methods on 
> zip (on the contrary); but I do like the methods-on-function API, like 
> itertools.chain has. Functions are namespaces, and we under-utilise 
> that fact in our APIs.
> 
>Namespaces are one honking great idea -- let's do more of those!
> 
> Here is a sketch of how you might do it:
> 
># Untested.
>class MyZipBackport():
>real_zip = builtins.zip
>def __call__(self, *args):
>return self.real_zip(*args)
>def __getattr__(self, name):
># Delegation is another under-utilised technique.
>return getattr(self.real_zip, name)
>def skip(self, *args):
># insert implementation here...
> 
>builtins.zip = MyZipBackport()

But this doesn’t do what the OP suggested; it’s a completely different 
proposal. They wanted to write this:

zipped = zip(xs, ys).skip()

… and you’re offering this:

zipped = zip.skip(xs, ys)

That’s a decent proposal—arguably better than the one being discussed—but it’s 
definitely not the same one.

> I don't know what "zip.skip" is supposed to do,

I quoted it in the email you’re responding to: it’s supposed to yield short 
tuples that skip the iterables that ran out early. But from the wording you 
quoted it should be obvious that isn’t an issue here anyway. As long as you 
understand their point that they want to leave things open for expansion to new 
forms of zipping in the future, you can understand my point that their design 
makes that harder rather than easier.

>> Also, what exactly do these methods return?
> 
> An iterator. What kind of iterator is an implementation detail.
> 
> The type of the zip objects is not part of the public API, only the 
> functional behaviour.

Now go back and do what the OP actually asked for, with the zip iterator type 
having shortest(), equal(), and longest() methods in 3.9 and a skip() method 
added in 3.10. It’s no longer just “some iterator type, doesn’t matter”, it has 
specific methods on it, documented as part of the public API, and you need to 
either subclass it or emulate it. That’s exactly the problem I’m pointing out. 
The fact that it’s not true in 3.8, it’s not required by the problem, it’s not 
true of other designs proposed in this thread like just having more separate 
functions in itertools, it’s specifically a flaw with this design.

So the fact that you can come up with a different design without that flaw 
isn’t an argument against my point, it’s just a probably-unnecessary further 
demonstration of my point.

Your design looks like a pretty good one at least at first glance, and I think 
you should propose it seriously. You should be showing why it’s better than 
adding methods to zip objects—and also better than adding more functions to 
itertools or builtins, or flags to zip, or doing nothing—not pretending it’s 
the same as one of those other proposals and then trying to defend that other 
proposal by confusing the problems with it.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WINTXNJWN7THOKAWTCFK3GZICEFDJJIC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-26 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 14:36, Daniel Moisset  wrote:
> 
> This idea is something I could have used many times. I agree with many people 
> here that the strict=True API is at least "unusual" in Python. I was thinking 
> of 2 different API approaches that could be used for this and I think no one 
> has mentioned:
> we could add a callable filler_factory keyword argument to zip_longest. That 
> would allow passing a function that raises an exception if I want "strict" 
> behaviour, and also has some other uses (for example, if I want to use [] as 
> a filler value, but not the *same* empty list for all fillers)
This could be useful, and doesn’t seem too bad.

I still think an itertools.zip_equal would be more discoverable and more easily 
understandable than something like itertools.zip_longest(fill_factory=lambda: 
throw(ValueError)), especially since you have to write that thrower function 
yourself. But if there really are other common uses like 
zip_longest(fill_factory=list), that might make up for it.
> we could add methods to the zip() type that provide different behaviours. 
> That way you could use zip(seq, seq2).shortest(), zip(seq1, seq2).equal(), 
> zip(seq1, seq2).longer(filler="foo") ; zip(...).shortest() would be 
> equivalent to zip(...).  Other names might work better with this API, I can 
> think of zip(...).drop_tails(), zip(...).consume_all() and zip(...).fill(). 
> This also allows adding other possible behaviours (I wouldn't say it's 
> common, but at least once I've wanted to zip lists of different length, but 
> get shorter tuples on the tails instead of fillers).

This second one is a cool idea—but your argument for it seems to be an argument 
against it.

If we stick with separate functions in itertools, and then we add a new one for 
your zip_skip (or whatever you’d call it) in 3.10, the backport is trivial. 
Either more-itertools adds zip_skip, or someone writes an itertools310 library 
with the new functions in 3.10, and then people just do this:

try:
from itertools import zip_skip
except ImportError:
from more_itertools import zip_skip

But if we add methods on zip objects, and then we add a new skip() method in 
3.10, how does the backport work? It can’t monkeypatch the zip type (unless we 
both make the type public and specifically design it to be monkeypatchable, 
which C builtins usually aren’t). So more-itertools or zip310 or whatever has 
to provide a full implementation of the zip type, with all of its methods, and 
probably twice (in Python for other implementations plus a C accelerator for 
CPython). Sure, maybe it could delegate to a real zip object for the methods 
that are already there, but that’s still not trivial (and adds a performance 
cost).

Also, what exactly do these methods return? Do they set some flag and return 
self? If so, that goes against the usual Python rule that mutator methods 
return None rather than self. Plus, it opens the question of what zip(xs, 
ys).equal().shortest() should do. I think you’d want that to be an 
AttributeError, but the only sensible way to get that is if equal() actually 
returns a new object of a new zip_equal type rather than self. So, that solves 
both problems, but it means you have to implement four different builtin types. 
(Also, while the C implementation of those types, and constructing them from 
the zip type’s methods, seems trivial, I think the pure Python version would 
have to be pretty clunky.)___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VF7VRHZPDJXOT3DKYNK3KWUS6HBW3OLX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding a "once" function to functools

2020-04-26 Thread Andrew Barnert via Python-ideas

On Apr 26, 2020, at 10:49, Eric Fahlgren  wrote:
> 
>> On Sun, Apr 26, 2020 at 9:46 AM Alex Hall  wrote:
>> It's not clear to me why people prefer an extra function which would be 
>> exactly equivalent to lru_cache in the expected use case (i.e. decorating a 
>> function without arguments). It seems like a good way to cause confusion, 
>> especially for beginners. Based on the Zen, there should be one obvious way 
>> to do it.
> 
> I don't believe it is.  lru_cache only guarantees that you will get the same 
> result back for identical arguments, not that the function will only be 
> called once.  Seems to me if you call it, then in the middle of caching the 
> value, there's a thread change, you could get to the function wrapped by 
> lru_cache twice (or more times).  In order to implement once, it needs to 
> contain a thread lock to ensure its "once" moniker and support the singleton 
> pattern for which it is currently being used (apparently incorrectly) in 
> django and other places.  Am I understanding threading correctly here?

There are three different use cases for “once” in a threaded program: 1. It’s 
incorrect or dangerous to even call the function twice. 2. The function isn’t 
idempotent but you need it to be. 3. The function is idempotent and it’s purely 
a performance optimization. For the third case, you don’t need any 
synchronization for correctness (as long as reading and writing the cache value 
is atomic), and it may actually be a lot faster. Sure, it means occasionally 
you end up doing the work two or even more times at startup, but in exchange 
you avoid a zillion thread locks, which can be a lot more expensive. If that’s 
the case with those Django uses, they’re not using it incorrectly.

Also, if you know your app’s sequencing well enough and know exactly what the 
GIL guarantees, you might be able to prove (or at least convince yourself well 
enough that if test X passes it’s almost certainly safe) that there’s no chance 
of startup contention. This includes the really trivial case where you know 
what’s needed before you fork any threads that might need it (although for a 
lot of those cases, in Python, it’s probably simpler to just use a module 
global, but using an unsynchronized cache isn’t terrible for readability).

Of course it’s also possible that Django is using it incorrectly and it just 
shows up as a handful of web apps starting up wrong one in a million instances 
and there are live bugs all over the internet that nobody’s handling right. But 
I wouldn’t just assume that it’s incorrect and add a new feature to Python and 
encourage Django to rewrite a whole lot of code to use it without finding an 
actual bug first.

Also, it’s pretty easy to turn a unsynchronized implementation into a 
synchronized one: just add a @synchronized decorator around the @lru_cache or 
@cached_property decorator (or write a
simple @synchronized_lru_cache or @synchronized_cached_property decorator and 
use that). So, does Python really need to include anything in the stdlib to 
make it easier? (That’s not a rhetorical question; I’m not sure.)

On the other hand, if the bugs are actually the second case rather than the 
first, you can solve that with something faster than a full read-write mutex, 
but it’s a lot more complicated (and may not even be writeable in Python at 
all): read-acquire the cache, and if it’s empty, call the function and then 
compare-and-swap-release the cache, and if the CAS fails that means someone 
else got there first so discard your value and return theirs. If that comes up 
a lot and the performance benefit is often worth having, that seems like it 
should definitely be in the stdlib because people won’t get it right. But I 
doubt it does.

One last thing: the best way to cache an idempotent nullary function with 
lru_cache is to use maxsize=None. If people are leaving the default 128, maybe 
the docs need to be improved in some way?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BYGTJWJQQ2YEG6KKDCXAM7ZFSMYWANEX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-25 Thread Andrew Barnert via Python-ideas

On Apr 25, 2020, at 09:40, Christopher Barker  wrote:
> 
>   - The main exception to this may be when one of them is infinite, but how 
> common is that, really? Remember that when zip was first created (py2) it was 
> a list builder, not an iterator, and Python itself was much less 
> iterable-focused.

Well, yes, and improvements like that are why Python 3.9 is a better language 
than Python 2.0 (when zip was first added). Python wasn’t just much less 
iterable-focused, it didn’t even have the concept of “iterable”. While it did 
have map and filter, the tutorial taught you to loop over range(len(xs)), only 
mentioning map and filter as “good candidates to pass to lambda forms” for 
people who really want to pretend Python is Lisp rather than using it properly. 
Adding the iterator protocol and more powerful for loop; functions like zip, 
enumerate, and iter; generators, comprehensions, and generator expressions; 
itertools; yield from; and changing map and friends to iterators is a big part 
of why you can write all kinds of things naturally in Python 3.9 that were 
clumsy, complicated, or even impossible. Sure, you can use it as if it were 
Python 2.0 but with Unicode, but it’s a lot more than that.

But also, why was zip added with “shortest” behavior in 2.0 in the first place? 
It wasn’t to support infinite or otherwise lazy lists, because those didn’t 
exist. And it wasn’t chosen on a whim. In Python 1.x, if you knew your lists 
were the same length, you used map with None as the function. (Well, usually 
you just looped over range(len(first_list)), but if you wanted to be all Lispy, 
you used map.) But if you didn’t know the lists were the same length, you 
couldn’t (because map had “longest” behavior, with an unchangeable fillvalue of 
None, until 3.0). If that didn’t actually come up for people even in Python 
1.x, nobody would have asked for it in 2.0.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VTWDBKYES27GRT6ZH3SPUNI5YDCE3YQY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-24 Thread Andrew Barnert via Python-ideas

On Apr 24, 2020, at 11:07, Brandt Bucher  wrote:
> 
> 1. Likely the most common case, for me, is when I have some data and want to 
> iterate over both it and a calculated pairing:
> 
 x = ["a", "b", "c", "d"]
 y = iter_apply_some_transformation(x)
 for a, b in zip(x, y):
> ... ...  # Do something.
> ...

Your other examples are a lot more compelling. I can easily imagine actually 
being bitten by zip(*ragged_iterables_that_I_thought_were_rectangular) and 
having a hard time debugging that, and the other one is an actual bug in actual 
code, which is even harder to dismiss.

I think this one, on the other hand, is exactly what I think doubters are 
imagining. I can easily imagine cases where you want to zip together two 
obviously-equal iterables, but when they’re obviously equal, adding a check for 
that is hardly the first thing I’d think about defending against. (For example, 
things like using “spam eggs cheese”.strip() instead of .split() as the input 
are more common logic errors and even less fun to debug…)

And that’s why people keep asking for examples—because the proponents of the 
change keep talking as if there are examples like your 2 and 3 where everyone 
would agree that there’s a significant benefit to making it easier to be 
defensive, but the wary conservatives are only imagining examples like your 1.

Anyway, if I’m right, I think you just solved that problem, and now everyone 
can stop talking past each other.

(Although the couple of people who suggested wanting to _handle_ the error as a 
normal case rather than treating it as a logic error to debug like your 
examples still need to give use cases if they want anything different than what 
you want.)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/H3AVAVKK4FCVAYNXIVUXFA4MS4HLPLVO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Make type(None) the no-op function

2020-04-24 Thread Andrew Barnert via Python-ideas

> On Apr 24, 2020, at 08:46, Soni L.  wrote:
> 
> it's not my own use case for once. the PEP clearly lists a use-case. we 
> should support that use-case.

So your use case is the rationale from a PEP written because Barry “can’t 
resist” and rejected as a joke in record time, for which a better solution was 
already added (use 0 for the environment variable) 3 years ago.

And you ended your email with this code snippet:

 NoneType
> Traceback (most recent call last):
>  File "", line 1, in 
> NameError: name 'NoneType' is not defined

… which demonstrates that even if you had a real need for a 
PYTHON_BREAKPOINT=noop, your proposal wouldn’t have helped anyway. There is 
actually a way you could pass it, but if you don’t know that way, and felt the 
need to show us that you don’t know it, that can’t be your use case. And even 
after you figure it out, it would hardly be any more obvious to other people 
who need a noop function where to go digging for it than it was for you.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AHQX44DL2GRLDNOBCRMXG62LP7BADA7V/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-23 Thread Andrew Barnert via Python-ideas

> On Apr 22, 2020, at 14:09, Steven D'Aprano  wrote:
> 
> On Wed, Apr 22, 2020 at 10:33:24AM -0700, Andrew Barnert via Python-ideas 
> wrote:
> 
>> If that is your long-term goal, I think you could do it in three steps.
> 
> I think the first step is a PEP. This is not a small change that can be 
> just done on a whim.

Yes, I agree. Each of the three steps will very likely require a PEP.

And not only that, the PEP for this first step has to make it clear that it’s 
useful on its own—not just to people like Serhiy who eventually want to replace 
zip and see it as a first step, but also to people who do not want zip to ever 
change but do want a convenient way to opt in to checking zips (and don’t find 
more-itertools convenient enough) and see this as the _only_ step.

>> And of course after the first two steps you can proselytize for the
>> next one. If you can convince lots of people that they should care
>> about the choice more often and get them using the explicit functions,
>> it’ll be a lot harder to argue that everyone is happy with today’s
>> behavior.
> 
> If they need to be *convinced* to use the new function, then they don't 
> really need it and didn't want it.

I had to be convinced that I wanted str.format. (The guy who convinced me was 
enthusiastic enough that he went through the effort of writing a __format__ 
method for my Fixed1616 class to show how easily extensible it is.) But really, 
I did want it, and just didn’t know it yet.

Hell, I had to be convinced to use Python instead of sticking with Perl and 
Tcl, but it turned out I did want it.

Let’s assume that the proponents of adding zip_strict are right that using it 
will often give you early failures on some common uses that are today painful 
to debug. If so, most people don’t know that today, and aren’t going to think 
of it just because a new function shows up in itertools, or a new flag on a 
builtin, or whatever. Someone will have to convince them to use it. But then, 
one evening, they’ll get an exception and realize, “Whoa, that would have taken 
me hours to debug otherwise, if I’d even spotted the bug…”, and they’ll realize 
they needed it, just as much as the handful who noticed the need in advance and 
went looking.

The proponents of the bigger, longer-term change of eventually making this the 
default behavior for zip may be right too. If so, many of the people who were 
convinced to use zip_strict will find it helpful so often, and zip_shortest so 
unusual in their code, that they start asking why the hell strict isn’t the 
default instead of shortest. And then it’ll be a lot easier for Serhiy or 
whoever to sell such a big change. Of course if that doesn’t ever happen, it’ll 
be a lot harder to sell the change—but in that case, the change would be a 
mistake, so that’s good too.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PPSOSLWFLGV4KF2X44THDJ53XPIOSZTY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Add extend_const action to argparse

2020-04-22 Thread Andrew Barnert via Python-ideas

On Apr 22, 2020, at 15:04, pyt...@roganartu.com wrote:
> 
> The natural extension to this filtering idea are convenience args that set 
> two const values (eg: `--filter x --filter y` being equivalent to 
> `--filter-x-y`), but there is no `extend_const` action to enable this.
> 
> While this is possible (and rather straight forward) to add via a custom 
> action, I feel like this should be a built-in action instead. `append` has 
> `append_const`, it seems intuitive and reasonable to expect `extend` to have 
> `extend_const` too (my anecdotal experience the first time I came across this 
> need was that I simply tried using `extend_const` without checking the docs, 
> assuming it already existed).

I’m pretty sure I’ve run into the exact same situation (well, not accumulating 
filters, but accumulating something and wanting to add multiple constants from 
one flag), had the same “Really? It’s not there?” reaction as you, and then 
just muttered and worked around it. It makes sense to me to fix it, exactly the 
way you propose.

My only comment is that when you write the example(s) for the docs, it might be 
worth using a tuple rather than a list for the const value. It doesn’t really 
make a difference, but people might be momentarily confused by a mutable list 
called “const”.

Also, looking at the _copy_items function you’re calling: it has a comment 
saying it’s only used by append and append_const, but that’s wrong as it’s also 
used by extend. And of course you’re adding extend_const. I don’t know if 
that’s worth fixing separately, but if not it seems to me it’s probably worth 
fixing in your patch.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VGXVYH5ICUJSMMWWZZGBDHBUIYFA5IWT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-22 Thread Andrew Barnert via Python-ideas

> On Apr 21, 2020, at 16:02, Steven D'Aprano  wrote:
> 
> On Tue, Apr 21, 2020 at 12:25:06PM -0700, Andrew Barnert via Python-ideas 
> wrote:
>>> On Apr 21, 2020, at 01:36, Serhiy Storchaka  wrote:
>>>  except ValueError: # assuming that’s the exception you want?
>> For what it’s worth, more_itertools.zip_equal raises an 
>> UnequalIterablesError, which is a subclass of ValueError.
>> I’m not sure whether having a special error class is worth it, but that’s 
>> because nobody’s providing any examples of code where they’d want to handle 
>> this error. Presumably there are cases where something else in the 
>> expression could raise a ValueError for a different reason, and being able 
>> to catch this one instead of that one would be worthwhile. But how often? No 
>> idea.
> 
>> At a guess, I’d say that if this has to be a builtin (whether
>> flag-switchable behavior in zip or a new builtin function) it’s
>> probably not worth adding a new builtin exception, but if it’s going
>> to go into itertools it probably is worth it.
> 
> Why?

Well, you quoted the answer above, but I’ll repeat it:

>> Presumably there are cases where something else in the expression could 
>> raise a ValueError for a different reason, and being able to catch this one 
>> instead of that one would be worthwhile. But how often? No idea.

For a little more detail:

A few people (like Soni) keep trying to come up with general-purpose ways to 
differentiate exceptions better. The strong consensus is always that we don’t 
need any such thing, because in most cases, Python gives you just enough to 
differentiate what you actually need in most code. (That wasn’t quite true in 
Python 2, but it is now.) We have LookupError with subclasses KeyError and 
IndexError, but not additional subclasses IndexTooBigError and 
IndexTooSmallError, and so on. For the IOError subclasses, Python does kind of 
lean on C/POSIX, but that’s still good enough that it’s fine.

The question in every case is: do you often need to distinguish this case? In 
this case: will the zip_strict postcondition violation be used in a lot of 
places where there are other likely sources of ValueError that need to be 
distinguished? If so, it should be a separate subclass. If that will be rare, 
it shouldn’t.

As I said, I don’t know the answer to that question, because none of the people 
saying they need an exception here have given any examples where they’d want to 
handle the exception, and it’s hard to guess how people want to handle an 
exception when you don’t even know where and when they want to handle it. So I 
took a guess to start the discussion. If you have a different guess, fine. But 
really, we need the people who have code in mind that would actually use this 
to show us that code or tell us about it.

> I know that the Python community has a love-affair with more-itertools, 
> but I don't think that it is a well-designed library offering good APIs. 
> It's a grab-bag of "everything including the kitchen sink". Just because 
> they use a distinct exception doesn't mean we should follow them.

If I thought we should just do what more-itertools does without thinking, I 
would have said “more-itertools has a separate exception, so we should”, rather 
than saying “For what it’s worth, more-itertools has a separate exception” and 
then concluding that I don’t know if we actually need one and we need to look 
at actual examples to decide.

When all else is equal, I think it’s worth being consistent with more-itertools 
just because that way we get an automatic backport. But that’s not a huge win, 
and quite often, all else isn’t equal, so looking at what more-itertools does 
and why isn’t the answer, it’s just one piece of information to throw into the 
discussion. And I think that’s the case here: their design raises a question 
for us to answer, but it doesn’t answer it for us.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M2IYSPBE37G55QQ4PL2RDFJ3BXLBTI56/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-22 Thread Andrew Barnert via Python-ideas

On Apr 21, 2020, at 19:35, Steven D'Aprano  wrote:
> 
> On Mon, Apr 20, 2020 at 07:47:51PM -0700, Andrew Barnert wrote:
> 
 counter = itertools.count()
 yield from zip(counter, itertools.chain(headers, [''], body, [''])
 lines = next(counter)
>>> 
>>> That gives you one more than the number of lines yielded.
>> 
>> Yeah, I screwed that up in simplifying the real code without testing 
>> the result. And your version gives one _less_ than the number yielded.
> 
> No, my version repeats the last number yielded, which is precisely what 
> you wanted (as I understand it).

No, I wanted the number of lines yielded. You not only quoted that, but 
directly claimed that you were giving the number of lines yielded. But you’re 
not; you’re giving me the number of the last line, which is 1 less than that.

>py> def test():
>... headers = body = ''
>... for t in enumerate(itertools.chain(headers, [''], body, [''])):
>... yield t
>... print(t[0])
>...
>py> list(test())
>1
>[(0, ''), (1, '')]

Right. The number of pairs yielded is 2. Your code prints 1.

>> (With either enumerate(xs) or zip(counter, xs) the last element will 
>> be (len(xs)-1, xs[-1]).
> 
> Um, yes? That's because both enumerate and counter start from zero by 
> default. I would have asked you why you were counting your lines 
> starting from zero instead of using `enumerate(xs, 1)` but I thought 
> that was intentional.

You were right, counting from 0 was intentional. Just as it is almost 
everywhere in Python. The caller needs those line numbers; otherwise I wouldn’t 
be yielding them in the first place.

And that’s why your solution is wrong: you correctly left it counting from 0, 
but then incorrectly assumed that the last number equals the count, which is 
only true when counting from 1. If that’s not a classic fencepost error, I 
don’t know what is. And my originally-posted version has a different fencepost 
error, as you pointed out. And my real code doesn’t, but I may well have made 
one and had to spend a minute debugging it. Nontrivial counting code often has 
fencepost errors, and Python only eliminates the sources that come up often, 
not every possible one that might come up rarely, which is fine. And this 
proposal doesn’t change that in any way, nor is it meant to.

>> Your version has the additional problem that 
>> if the iterable is empty, t is not off by one but unbound (or bound to 
>> some stale old value)—but that’s not possible in my example, and 
>> probably not in most similar examples.
> 
> But the iterable is never empty, because you always yield at least 
> two blanks.

Yes; I said “but that’s not possible in my example”, as you quoted directly 
above.

> I don't believe this zip_strict proposal would help you in this 
> situation. I think it will make it worse,

Well, of course. Since it wasn’t an argument for the proposal, but an example 
pointing out a potential hole in the proposal that needed to be thought 
through, why would you expect the proposal to help it?

To recap: Someone had said that it doesn’t matter what state the iterables are 
left in, because nobody ever looks at an iterator after zip. So I gave an 
example of (simplified) real code that looks at an iterator after zip. So 
people thought through what state the iterables should be left in by this new 
zip_strict function, and there is a reasonable answer. Even if your arguments 
about this example were correct, they wouldn’t be relevant to the thread, 
because the entire purpose of giving the example has been fulfilled.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B64KN2HMCNXPRKPULNP3KE4HQM5A4F2U/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-22 Thread Andrew Barnert via Python-ideas

On Apr 22, 2020, at 01:52, Serhiy Storchaka  wrote:
> 
> 22.04.20 11:20, Antoine Pitrou пише:
>> Ideally, that's what it would do.  Whether it's desirable to transition
>> to that behaviour is an open question.
>> But, as far as I'm concerned, the number of times where I took
>> advantage of zip()'s current acceptance of heteregenously-sized inputs
>> is extremely small.  In most of my uses of zip(), a size difference
>> would have been a logic error that deserves noticing and fixing.
> 
> I concur with Antoine. Ideally we should have several functions: 
> zip_shortest(), zip_equal(), zip_longest(). In most cases (80% or 90% or 
> more) they are equivalent, because input iterators has the same length, but 
> it is safer to use zip_equal() to catch bugs. In other cases you would use 
> zip_shortest() or zip_longest(). And it would be natural to rename the most 
> popular variant to just zip().
> 
> Now it is a breaking change. We had a chance to do it in 3.0, when other 
> breaking change was performed in zip(). I do not know if it is worth to do 
> now. But when we plan any changes in zip() we should take into account 
> possible future changes and make them simpler, not harder.

If that is your long-term goal, I think you could do it in three steps.

First, just add itertools.zip_equal. Ideally the docs should replace the usual 
“Added in 3.9” with something like “Added in 3.9; if you need the same function 
in earlier versions see more-itertools” (linked to the more-itertools blurb at 
the top of the page). It seems like there’s a lot of support for this step even 
from people who don’t want your big goal.

Second, add itertools.zip_shortest. And change zip’s docs to say that it’s the 
same as zip_shortest and mention the other two choices, and maybe even to try 
to nudge people to explicitly decide which of the three they want. And find 
some places in the tutorial that use zip and change them to use zip_equal and 
zip_shortest as appropriate. I think that gets you about as much as you can get 
without backward compatibility issues, and it also gets you closer to being 
able to deprecate zip or change it to alias zip_equal, rather than making it 
harder.

Third, do the deprecation. By that point, everyone maintaining existing code 
will have an easy way to defensively prepare for it: as long as they can 
require 3.10+ or more-itertools, they can just change all uses of zip to 
zip_shortest and they’re done. Still not painless, but about as painless as a 
backward compatibility break could ever be.

And of course after the first two steps you can proselytize for the next one. 
If you can convince lots of people that they should care about the choice more 
often and get them using the explicit functions, it’ll be a lot harder to argue 
that everyone is happy with today’s behavior.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NVBFRNG4PPJQ3SEIZJMGXY5UFB3LNWZ3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-21 Thread Andrew Barnert via Python-ideas

On Apr 21, 2020, at 01:36, Serhiy Storchaka  wrote:
> 
>except ValueError: # assuming that’s the exception you want?

For what it’s worth, more_itertools.zip_equal raises an UnequalIterablesError, 
which is a subclass of ValueError.

I’m not sure whether having a special error class is worth it, but that’s 
because nobody’s providing any examples of code where they’d want to handle 
this error. Presumably there are cases where something else in the expression 
could raise a ValueError for a different reason, and being able to catch this 
one instead of that one would be worthwhile. But how often? No idea.

At a guess, I’d say that if this has to be a builtin (whether flag-switchable 
behavior in zip or a new builtin function) it’s probably not worth adding a new 
builtin exception, but if it’s going to go into itertools it probably is worth 
it.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3CMU427P5H7RNZF5QQ7QAAEEMAYLFTBU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-21 Thread Andrew Barnert via Python-ideas

On Apr 21, 2020, at 01:36, Serhiy Storchaka  wrote:
> 
> 20.04.20 23:33, Andrew Barnert via Python-ideas пише:
>> Should this print 1 or 2 or raise StopIteration or be a don’t-care?
>> Should it matter if you zip(y, x, strict=True) instead?
> 
> It should print 2 in both cases. The only way to determine whether the 
> iterator ends is to try to get its next value. And this value (1) will lost, 
> because there is no way to return it or "unput" to the iterator. There is no 
> reason to consume more values, so StopIteration is irrelevant.
> 
> There is more interesting example:
> 
>x = iter(range(5))
>y = [0]
>z = iter(range(5))
>try:
>zipped = list(zip(x, y, z, strict=True))
>except ValueError: # assuming that’s the exception you want?
>assert zipped == [(0, 0, 0)]
>assert next(x) == 2
>print(next(z))
> 
> Should this print 1 or 2?
> 
> The simple implementation using zip_longest() would print 2, but more optimal 
> implementation can print 1.

You’re right; that’s the question I should have asked; thanks.

As I said, I think either answer is probably acceptable as long as it’s 
documented (and, therefore, it’s also clear that the consequences have been 
thought through).

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EAWRMFD3JOSMIGHRLOHYQZMWNKKVDBRU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Keyword arguments self-assignment

2020-04-21 Thread Andrew Barnert via Python-ideas

On Apr 21, 2020, at 01:27, M.-A. Lemburg  wrote:
> 
> On 21.04.2020 04:25, Andrew Barnert via Python-ideas wrote:
>>> On Apr 20, 2020, at 16:24, M.-A. Lemburg  wrote:
>>> 
>>> On 20.04.2020 19:43, Andrew Barnert via Python-ideas wrote:
>>>>> On Apr 20, 2020, at 01:06, M.-A. Lemburg  wrote:
>>>>> 
>>>>> The current version already strikes me as way too complex.
>>>>> It's by far the most complex piece of grammar we have in Python:
>>>>> 
>>>>> funcdef: 'def' NAME parameters ['->' test] ':' [TYPE_COMMENT]
>>>>> func_body_suite
>>>> 
>>>> But nobody’s proposing changing the function definition syntax here, only 
>>>> the function call syntax. Which is a lot less hairy. It is still somewhat 
>>>> hairy, but nowhere near as bad, so this argument doesn’t really apply.
>>> 
>>> True, I quoted the wrong part of the grammar for the argument,
>>> sorry. I meant this part:
>>> 
>>> https://docs.python.org/3/reference/expressions.html#calls
>>> 
>>> which is simpler, but not really much, since the devil is in
>>> the details.
>> 
>> Let’s just take one of the variant proposals under discussion here, adding 
>> ::identifier to dict displays. This makes no change to the call grammar, or 
>> to any of the call-related bits, or any other horribly complicated piece of 
>> grammar. It just changes key_datum (a nonterminal referenced only in 
>> dict_display) from this:
>> 
>>expression ":" expression | “**” or_expr
>> 
>> … to this:
>> 
>>expression ":" expression | “::” identifier | “**” or_expr
>> 
>> That’s about as simple as any syntax change ever gets.
>> 
>> Which is still not nothing. But you’re absolutely right that a big and messy 
>> change to function definition grammar would have a higher bar to clear than 
>> most syntax proposals—and for the exact same reason, a small and local 
>> change to dict display datum grammar has a lower bar than most syntax 
>> proposals.
> 
> I think the real issue you would like to resolve is how to get
> at the variable names used for calling a function, essentially
> pass-by-reference (in the Python sense, where variable names are
> references to objects, not pointers as in C) rather than
> pass-by-value, as is the default for Python functions.

No, nobody’s asking for that either.

It wouldn’t directly solve most of the examples in this thread, or even 
indirectly make them easier to solve. The problem in most cases is that they 
have to call a function that they can’t change with a big mess of parameters. 
Any change to help the callee side doesn’t do any good, because the callee is 
the thing they can’t change. The fix needs to be on the caller side alone.

This also wouldn’t give you useful pass-by-reference in the usual sense of “I 
want to let the callee rebind the variables I pass in”, because a name isn’t a 
reference in Python without the namespace to look it up in. Even if the callee 
knew the name the caller used for one of its parameters, how would it know 
whether that name was a local or a cell or a global? If it’s a local, how would 
it get at the caller’s local environment without frame hacking? (As people have 
demonstrated on this thread, frame hacking on its own is enough, without any 
new changes.) Even if it could get that local environment, how could it rebind 
the variable when you can’t mutate locals dicts?

Also, most arguments in Python do not have names, because arguments are 
arbitrary expressions. Of course the same thing is true in, say, C++, but 
that’s fine in C++, because lvalue expressions have perfectly good lvalues even 
if they don’t have good names. You can pass p->vec[idx+1].spam to a function 
that wants an int&, and it can modify its parameter and you’ll see the change 
on your side. How could your proposal handle even the simplest case of passing 
lst[0]?

Even if it could work as advertised, it’s hugely overkill for this problem. A 
full-blown macro system would let people solve this problem, and half the other 
things people propose for Python, but that doesn’t mean that half the proposals 
on this list are requests for a full-blown macro system, or that it’s the right 
answer for them.

> The f-string logic addresses a similar need.

Similar, yes, but the f-string logic (a) runs in the caller’s scope and (b) 
evaluates code that’s textually part of the caller.

> With a way to get at the variable names used for calling a
> function from inside a function, you could then write a dict
> constructor which gives you the subset of vars() you are
> looking for.

Most of the use cases involve “I

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-20 Thread Andrew Barnert via Python-ideas

> On Apr 20, 2020, at 17:22, Steven D'Aprano  wrote:
> 
> On Mon, Apr 20, 2020 at 03:28:09PM -0700, Andrew Barnert via Python-ideas 
> wrote:
> 
>> Admittedly, such cases are almost surely not that common, but I
>> actually have some line-numbering code that did something like this
>> (simplified a bit from real code):
>> yield from enumerate(itertools.chain(headers, [''], body, [''])
>> … but then I needed to know how many lines I yielded, and there’s no
>> way to get that from enumerate, so instead I had to do this:
> 
> Did you actually need to "yield from"? Unless your caller was sending 
> values into the enumerate iterable, which as far as I know enumerate 
> doesn't support, "yield from" isn't necessary.

True. Using yield from is more efficient, more composeable, and usually (but 
not here) more concise and readable, but none of those are relevant to my 
example (or the real code). I suppose it’s just a matter of habit to reach for 
yield from before a loop over yield even in cases where it doesn’t matter much.

>> counter = itertools.count()
>> yield from zip(counter, itertools.chain(headers, [''], body, [''])
>> lines = next(counter)
> 
> That gives you one more than the number of lines yielded.

Yeah, I screwed that up in simplifying the real code without testing the 
result. And your version gives one _less_ than the number yielded. (With either 
enumerate(xs) or zip(counter, xs) the last element will be (len(xs)-1, xs[-1]). 
Your version has the additional problem that if the iterable is empty, t is not 
off by one but unbound (or bound to some stale old value)—but that’s not 
possible in my example, and probably not in most similar examples.

Both are easy to fix in practice, but both (as we just demonstrated) even 
easier to get wrong the first time, like all fencepost errors. Maybe it would 
be better to use an undoable/peekable/tee wrapper after all, but without 
writing it out I’m not sure that wouldn’t be just as fencepostable…

Anyway, that’s exactly why I want to make sure the fencepost behavior is 
actually defined for this new proposal. Any reasonable answer is probably fine; 
people probably won’t run into wanting the leftovers, but if they ever do, as 
long as the docs say what should be there, they’ll work it out.

That, and the implementation constraint. If everyone were convinced that the 
only reasonable answer is to fully consume all inputs on error, that would be a 
bit of a problem, so it’s worth making sure nobody is convinced of that.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6KC76CDWZM45K3E6V3JXHJRTMLBXKY2R/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Keyword arguments self-assignment

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 16:24, M.-A. Lemburg  wrote:
> 
> On 20.04.2020 19:43, Andrew Barnert via Python-ideas wrote:
>>> On Apr 20, 2020, at 01:06, M.-A. Lemburg  wrote:
>>> 
>>> The current version already strikes me as way too complex.
>>> It's by far the most complex piece of grammar we have in Python:
>>> 
>>> funcdef: 'def' NAME parameters ['->' test] ':' [TYPE_COMMENT]
>>> func_body_suite
>> 
>> But nobody’s proposing changing the function definition syntax here, only 
>> the function call syntax. Which is a lot less hairy. It is still somewhat 
>> hairy, but nowhere near as bad, so this argument doesn’t really apply.
> 
> True, I quoted the wrong part of the grammar for the argument,
> sorry. I meant this part:
> 
> https://docs.python.org/3/reference/expressions.html#calls
> 
> which is simpler, but not really much, since the devil is in
> the details.

Let’s just take one of the variant proposals under discussion here, adding 
::identifier to dict displays. This makes no change to the call grammar, or to 
any of the call-related bits, or any other horribly complicated piece of 
grammar. It just changes key_datum (a nonterminal referenced only in 
dict_display) from this:

expression ":" expression | “**” or_expr

… to this:

expression ":" expression | “::” identifier | “**” or_expr

That’s about as simple as any syntax change ever gets.

Which is still not nothing. But you’re absolutely right that a big and messy 
change to function definition grammar would have a higher bar to clear than 
most syntax proposals—and for the exact same reason, a small and local change 
to dict display datum grammar has a lower bar than most syntax proposals.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T2LDV7TT6GA7Y2ZJ4FTWEY3EXCAJV3GR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Keyword arguments self-assignment

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 16:46, Christopher Barker  wrote:
> On Mon, Apr 20, 2020 at 3:13 PM Andrew Barnert  wrote:
> > Sure, it’s a declarative format, it’s just that often it’s intended to be 
> > understood as representing an object graph.
> 
> I"m not sure the point here -- I was not getting onto detail nor expalingnoi 
> myself well, but I think there are (kind of) three ways to "name" just one 
> piece of data that came from a bunch of JSON:
> 
> - a key, as in a dict `data['this']`
> - an attribute of an object: `data.this`
> - a local variable: `this`
> 
> what I was getting at is that there may be a fine line between the dsta 
> version and the object version, but that can go between those easily without 
> typing all the names.

OK, I thought you were saying that line is a serious problem for this proposal, 
so I was arguing that the same problems actually arise either way, and the same 
proposal helps both.

Since you weren’t saying that and I misinterpreted you, that whole part of the 
message is irrelevant. So I’ll strip all the irrelevant bits down to this quote 
from you that I agree with 100%:

> It's only when you have it in a local variable that this whole idea starts to 
> matter.

And I think we also agree that it would be better to make this a dict display 
feature, and a bunch of other bits.

But here’s the big issue:

> > If I have 38 locals for all 38 selectors in the API—or, worse, a 
> > dynamically-chosen subset of them—then “get rid of those locals” is almost 
> > surely the answer, but with just 1? Probably not. And maybe 3 or 4 is 
> > reasonable too—
> 
> right. but I don't think anyone is suggesting a language change for 1, or 
> even 3-4 names (maybe 4...)

The original post only had 2 arguments. Other people came up with examples like 
the popen one, which has something insane like 19 arguments, but most of them 
were either constants or computed values not worth storing; only 4 of them near 
the end we’re copied from locals. Steven’s example had 5. The “but JavaScript 
lets me do it” post has 3. I think someone suggested the same setup.py example 
you came up with later in this same example, and it had 3 or 4.

So I think people really are suggesting this for around 4 names.

And I agree that’s kind of a marginal benefit. That’s why I think the whole 
proposal is marginal. It’s almost never going to be a huge win—but it may be a 
small win in so many places that it adds up to being worth it.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IH2O23QXS53LY4KM4TS2V23Q6SSEYD7V/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-20 Thread Andrew Barnert via Python-ideas

Alex Hall wrote:
> Surely no exception is raised because zip is lazy?

Ack, you're right. The same problem would come up wherever you actually _use_ 
the zip, of course, but it's harder to demonstrate and reason about.

So change that toy example to `zipped = list(zip(x, y, strict=True))`.

(Fortunately, it looks like Ram got what I intended despite my mistake.)

> Doesn't it still have to be even with strict=True?

Well, I suppose technically it doesn't _have_ to be, but it certainly _should_ 
be.

(Although it's a bit weird to say "it should be lazy even with `strict=True`" 
out loud; maybe that's a mild argument for using a different qualifier like 
`equal`, as in more-itertools?)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HMJMEHH72F3WKODK6CHS3F6CQC6TDDUK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 13:49, Ram Rachum  wrote:
> 
> Good point. It would have to be dependent on position. In other words, you 
> would never pass an iterator into zip with any expectation that it would be 
> in a usable condition by the time it's done.
> 
> Actually, I can't think of any current scenario in which someone would want 
> to do this, with the existing zip logic.

Admittedly, such cases are almost surely not that common, but I actually have 
some line-numbering code that did something like this (simplified a bit from 
real code):

yield from enumerate(itertools.chain(headers, [''], body, [''])

… but then I needed to know how many lines I yielded, and there’s no way to get 
that from enumerate, so instead I had to do this:

counter = itertools.count()
yield from zip(counter, itertools.chain(headers, [''], body, [''])
lines = next(counter)

(Actually, at the same time I did that, I also needed to add some conditional 
bits to the chain, and it got way too messy for one line, so I ended up 
rewriting it as a sequence of separate `yield from zip(counter, things)` 
statements. But that’s just a more complicated demonstration of the same idea.)

But again, this probably isn’t very common.

And also, while you were asking about the existing zip logic, the more 
important question is the new logic you’re proposing. I can’t imagine a case 
where you’d want to check for non-empty and _then_ use it, which is what’s 
relevant here. There probably are such cases, but if so, they’re even rarer, 
enough so that the fact that you have to wrap something in itertools.tee or 
more_itertools.peekable to pull it off (or just not use the new 
strict=True/zip_strict/zip_equal) is probably not a great tragedy.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/B4BOQYPUQNFHLTUDZKIJIR2526UAQR2V/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Keyword arguments self-assignment

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 13:42, Christopher Barker  wrote:
> 
> On Mon, Apr 20, 2020 at 12:17 PM Andrew Barnert  wrote:
>> 
>> A lot of JSON is designed to be consumed by JavaScript, where there is no 
>> real line (there is no dict type; objects have both dict-style and dot-style 
>> access). So in practice, a lot of JSON maps well to data, a lot maps well to 
>> objects, and some is mushily designed and doesn’t quite fit either way, 
>> because in JS they all look the same anyway.
> 
> Well, sure. Though JSON itself is declarative data. 

Sure, it’s a declarative format, it’s just that often it’s intended to be 
understood as representing an object graph.

> In Python, you need to decide how you want to work with it, either as an 
> object with attributes or a dict. But if you are getting it from JSON, it's a 
> dict to begin with. So you can keep it as a dict, or populate an object with 
> it. B ut populating that object can be automated:
> 
> an_instance = MyObject(**the_dict_from_JSON)

But unless your object graph is flat, this doesn’t work. A MyObject doesn’t 
just have strings and numbers, it also has a list of MySubObjects; if you just 
** the JSON dict, you get subobjs=[{… d1 … }, { … d2 … }], when what you 
actually wanted was subobjs=[MySubObject(**d) for d in …].

It’s not like it’s _hard_ to write code to serialize and deserialize object 
graphs as JSON (although it’s hard enough that people keep proposing a __json__ 
method to go one way and then realizing they don’t have a proposal to go the 
other way…), but it’s not as trivial as just ** the dict into keywords.

> > But, maybe even more importantly: even if you _do_ decide it makes more 
> > sense to stick to data for this API, you have the parallel `{'country': 
> > country, 'year': year}` issue, which is just as repetitive and verbose.
> 
> only if you have those as local variables -- why are they ?

Apologies for my overly-fragmentary toy example.

Let’s say you have a function that makes an API request to some video site to 
get the local-language names of all movies of the user-chosen genre in the 
current year.

If you’ve built an object model, it’ll look something like this:

query = api.Query(genre=genre, year=datetime.date.today().year)
response = api.query_movies(query)
result = [movie.name[language] for movie in response.movies]

If you’re treating the JSON as data instead, it’ll look something like this:

query = {'query': {'genre': genre, 'year': datetime.date.today().year}}
response = requests.post(api.query_movies_url, json=query).json
result = [movie['name'][language] for movie in response.movies]

Either way, the problem is in that first line, and it’s the same problem. (And 
the existence of ** unpacking and the dict() constructor from keywords means 
that solving either one very likely solves the other nearly for free.)

Here I’ve got one local, `genre`. (I also included one global representing a 
global setting, just to show that they _can_ be reasonable as well, although I 
think a lot less often than locals, so ignore that.) I think it’s pretty 
reasonable that the local variable has the same name as the selector 
key/keyword. If I ask “why do I have to repeat myself with genre=genre or 
'genre': genre”, what’s the answer?

If I have 38 locals for all 38 selectors in the API—or, worse, a 
dynamically-chosen subset of them—then “get rid of those locals” is almost 
surely the answer, but with just 1? Probably not. And maybe 3 or 4 is 
reasonable too—a function that select by genre, subgenre, and mood seems like a 
reasonable thing. (If it isn’t… well, then I was stupid to pick an application 
area I haven’t done much work in… but you definitely don’t want to just select 
subgenre without genre in many music APIs, because your user rarely wants to 
hear both hardcore punk and hardcore techno.) And it’s clearly not an accident 
that the local and the selector have the same name. So, I think that case is 
real, and not dismissible.

> I'm not saying it never comes up in well designed code -- it sure does, but 
> if there's a LOT of that, then maybe some refactoring is in order.

Yes. And now that you point that out, thinking of how many people go to 
StackOverflow and python-list and so on looking for help with exactly that 
antipattern when they shouldn’t be doing it in the first place, there is 
definitely a risk that making this syntax easier could be an antipattern 
magnet. So, it’s not just whether the cases with 4 locals are important enough 
to overcome the cost of making Python syntax more complicated; the benefit has 
to _also_ overcome the cost of being a potential antipattern magnet. For me, 
this proposal is right on the border of being worth it (and I’m not sure which 
side it falls on), so that could be enough to change the answer, so… good thing 
you brought it up.

But I don’t think it eliminates the rationale for the proposal, or even the 
rationale for using it with JSON-related

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 10:42, Ram Rachum  wrote:
> 
> Here's something that would have saved me some debugging yesterday:
> 
> >>> zipped = zip(x, y, z, strict=True)
> 
> I suggest that `strict=True` would ensure that all the iterables have been 
> exhausted, raising an exception otherwise.

One quick bikeshedding question (which also gets to the heart of how you’d want 
to implement it); apologies if this came up in the thread from 2 years ago or 
the discussion in the more-iterables PR that I just suggested everyone should 
read before commenting, but I wanted to get this down before I forget it.

x = iter(range(5))
y = [0]
try:
zipped = zip(x, y, strict=True)
except ValueError: # assuming that’s the exception you want?
print(next(x))

Should this print 1 or 2 or raise StopIteration or be a don’t-care?

Should it matter if you zip(y, x, strict=True) instead?

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3CWTSFTLVGYHWPKMG45N4GZFEV2Z7RZF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 13:03, Eric V. Smith  wrote:
> 
> On 4/20/2020 3:39 PM, Andrew Barnert via Python-ideas wrote:
>> 
>> 
>> As I said, wanting to check does come up sometimes—I know I have written 
>> this myself at least once, and I’d be a little surprised if it’s not in 
>> more-itertools.
> 
> Interestingly, it looks like it it might be more_itertools.zip_equal, which 
> is listed at https://github.com/more-itertools/more-itertools, but is linked 
> to 
> https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.zip_equal
>  which is missing. Maybe it's new?

Yeah, it is new. See PR 415 
(https://github.com/more-itertools/more-itertools/pull/415) 21 days ago. There 
must be something in the air that’s made people suddenly want this more. :)

The PR does a great job linking to other discussions about this, including an 
-ideas thread from two years ago. I haven’t read through everything yet, but I 
notice that the first objection last time around was David Mertz pointing out 
that it’s not even in more-itertools, so maybe that more-itertools PR means 
it’s the perfect time to reopen this discussion? Or maybe it means we should 
wait a few months and see if people seem to be using the one in more-itertools? 
(And also maybe to wait for it to stabilize—there are a few bug fix commits to 
it after the initial merge.)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FF4IK5KX6EFRO2SHCCKLCZN72JD3XJ3H/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 11:25, Brandt Bucher  wrote:
> 
> I disagree. In my own personal experience, ~80% of the time when I use `zip` 
> there is an assumption that all of the iterables are the same length.

Sure, but I think cases where you want that assumption _checked_ are a lot less 
common. There are lots of postconditions that you assume just as often as “x, 
y, and z are fully consumed” and just as rarely want to check, so we don’t need 
to make it easy to check every possible one of them.

As I said, wanting to check does come up sometimes—I know I have written this 
myself at least once, and I’d be a little surprised if it’s not in 
more-itertools. But often enough to be a (flag on a) builtin? I’ve also written 
a zip that uses the length of the first rather than the shortest or longest, 
and a zip that skips rather than filling past the end of short inputs, and 
there are probably other variations that come up occasionally. But if they 
don’t come up that often, and are easy to write yourself, is there really a 
problem that needs to be fixed?

And even if checking is the most common option after the default, it seems like 
a weird API to have some options for what to do at the end as keyword parameter 
flags and other options as entirely separate functions. Maybe a flag for 
longest (or a single at_end parameter with an enum of different end-behaviors 
truncate, check, fill, skip where the signature can immediately show you that 
the default is truncate) would be a better design if you were doing Python from 
scratch, but I think the established existence of zip_longest pushes us the 
other way.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BNBKATJM4NDXUG53WRZFSO7VWRWDCA5B/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Keyword arguments self-assignment

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 11:01, Christopher Barker  wrote:
> 
> The JSON - related example is a good one -- JSON maps well to "data" in 
> Python, dicts and lists of numbers and strings. If you find yourself 
> converting a bunch of variable names to/from JSON, you probably should be 
> simply using a dict, and passing that around anyway.

A lot of JSON is designed to be consumed by JavaScript, where there is no real 
line (there is no dict type; objects have both dict-style and dot-style 
access). So in practice, a lot of JSON maps well to data, a lot maps well to 
objects, and some is mushily designed and doesn’t quite fit either way, because 
in JS they all look the same anyway. The example code for an API often shows 
you doing `result.movies[0].title.en`, because in JS you can. And in other 
languages, sometimes it is worth writing (or auto-generating) the code for 
Movie, etc. classes and serializing them to/from JSON so you can do the same. 
This is really the same point as “sometimes ORMs are useful”, which I don’t 
think is that controversial.

But, maybe even more importantly: even if you _do_ decide it makes more sense 
to stick to data for this API, you have the parallel `{'country': country, 
'year': year}` issue, which is just as repetitive and verbose. The `{::country, 
::year}` syntax obviously solves that dict key issue just as easily as it does 
for keywords. But most of the other variant proposals solve it at least 
indirectly via dict constructor calls—`dict(**, country, year)`, 
`dict(country=, year=)`, `dict(**{country, year})`, which isn’t quite as 
beautiful, but is still better than repeating yourself if the list of members 
or query conditions gets long.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EH24NKL5AGOK42DG4LO7XX43ZEG6V5D4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: zip(x, y, z, strict=True)

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 10:42, Ram Rachum  wrote:
> 
> Here's something that would have saved me some debugging yesterday:
> 
> >>> zipped = zip(x, y, z, strict=True)
> 
> I suggest that `strict=True` would ensure that all the iterables have been 
> exhausted, raising an exception otherwise.

This is definitely sometimes useful, but I think less often than zip_longest, 
which we already decided long ago isn’t important enough to push into zip but 
instead should be a separate function living in itertools.

I’ll bet there’s a zip_strict (or some other name for the same idea) in the 
more-itertools library. (If not, it’s probably worth submitting.) Whether it’s 
important enough to bring into itertools, add as a recipe, or call out as an 
example of what more-itertools can do in the itertools docs, I’m not sure. But 
I don’t think it needs to be added as a flag on the builtin.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QKU65L2QIEKUOABDAF6QCBJ3AKSXOHWG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Keyword arguments self-assignment

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 01:06, M.-A. Lemburg  wrote:
> 
> The current version already strikes me as way too complex.
> It's by far the most complex piece of grammar we have in Python:
> 
> funcdef: 'def' NAME parameters ['->' test] ':' [TYPE_COMMENT]
> func_body_suite

But nobody’s proposing changing the function definition syntax here, only the 
function call syntax. Which is a lot less hairy. It is still somewhat hairy, 
but nowhere near as bad, so this argument doesn’t really apply.

Also, you’re lumping all the different proposals here, but they don’t all have 
the same effect, which makes the argument even weaker.

Adding a ** mode switch does make calls significantly more complicated, because 
it effectively clones half of the call grammar to switch to a similar but new 
grammar.

But allowing keyword= is a simple and local change to one small subpart of the 
call grammar that I don’t think adds too much burden.

And ::value in dict displays doesn’t touch the call syntax at all; it makes 
only a trivial and local change to a subpart of the much simpler dict display 
grammar.

And **{a,b,c} is by far the most complicated, but the complicated part isn’t in 
calls (which would just gain one simple alternation); it’s in cloning half of 
the expression grammar to create a new nonset_expression node; the change to 
call syntax to use that new node is simple. (I’m assuming this proposal would 
make **{set display} in a call a syntax error when it’s not a magic 
set-of-identifiers unpacking, because otherwise I don’t know how you could 
disambiguate at all.)

So, even if you hadn’t mixed up definitions and calls, I don’t think this 
argument really holds much water.

I think your point that “hard to parse means hard to reason about” is a good 
one, however. That’s part of my rationale for the ::value syntax in dict 
displays: it’s a simple change to a simple piece of syntax that’s well isolated 
and consistent everywhere it appears. But I don’t think people would actually 
have a problem learning, internalizing, and reading keyword= syntax. And I 
think it may be an argument against the **{a,b,c} syntax, but only in a more 
subtle way than you’re advancing—people wouldn’t even internalize the right 
grammar; they’d just think of it as a special use of set displays (in fact 
Steven, who proposed it, encourages that reading), which is an extra special 
case to learn. Which can still be acceptable (lots of people get away with 
thinking of target lists as a special use of tuple displays…); it’s just a 
really high bar to clear.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JNENJWUEORPQ6UMMTWQPMKAZUBDE37B3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: list.append(x) could return x

2020-04-20 Thread Andrew Barnert via Python-ideas

On Apr 20, 2020, at 08:41, J. Pic  wrote:
> 
> 
> Currently, list.append(x) mutates the list and returns None.

Yes, which is consistent with the vast majority of mutating methods in Python. 
It would be pretty weird to make lst.append(x) return x, while lst.extend(xs) 
still returns None, not to mention similar methods on other types like set.add.

> It would be a little syntactic sugar to return x, for example:
> 
> something = mylist.append(Something())

You can already get the same effect with:

mylist.append(something := Something())

I think usually this would be more readable (and pythonic) as two lines rather 
than combined into one:

something = Something()
mylist.append(something)

But “usually” isn’t “always”, and that’s why we have escape hatches like the 
walrus operator. It’s for exactly this purpose, where a subexpression needs to 
be bound to a name, but for some reason it can’t or shouldn’t be extracted to a 
separate assignment statement.

And I think this is a lot clearer about what’s getting assigned to something, 
too. Someone who’s never seen the walrus operator won’t understand it, but at 
least it’s unlikely they’re going to misunderstand it as something other than 
what it means.

This is especially an issue with methods like list.append. In a lot of other 
languages, they return self, because the language encourages method chaining 
for fluid programming (Perl, Ruby), or because list appending is non-mutating 
(Scala, Haskell), or for bizarre reasons specific to that language (Go), so a 
lot of people are likely to misunderstand your syntax as meaning that something 
gets the new value of the list.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GFH5MNRLBRYDIKH4FPZYJVBGQU7ZR634/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Proposal: Keyword Unpacking Shortcut [was Re: Keyword arguments self-assignment]

2020-04-18 Thread Andrew Barnert via Python-ideas

On Apr 18, 2020, at 05:16, Alex Hall  wrote:
> 
> Is there anything else similar in the language? Obviously there are cases 
> where the same text has different meanings in different contexts, but I don't 
> think you can ever refactor an expression (or text that looks like an 
> expression) into a variable and change its meaning while keeping the program 
> runnable.

I suppose it depends on where you draw the “looks like an expression”, line, 
but I think there are cases that fit. It’s just that there are _not many_ of 
them, and most of them are well motivated. Each exception adds a small cost to 
learning the language, but Python doesn’t have to be perfectly regular like 
Lisp or Smalltalk, it just has to be a lot less irregular than C or Perl. Most 
special cases aren’t special enough, but some are.

A subscription looks like a list display, but it’s not. Mixing them up will 
only give you a syntax error if you use slices, ellipses, or *-unpacking in the 
wrong one, and often won’t even give you a runtime error. And the parallel 
isn’t even useful. But this is worth it anyway because subscription is so 
tremendously important.

A target list looks like a tuple display, but it’s not. Mixing them up will 
only give you a syntax error if you try to use a tuple display with a constant 
or a complex expression in it as a target list. Mixing them up in other ways 
will only give you at best an UnboundLocalError or NameError at runtime, and at 
worst silently wrong behavior. But the parallel here is more helpful than 
confusing (it’s why multiple-value return looks so natural in Python, for one 
thing), so it’s worth it.

**{a, b, c} is a special case in two ways: **-unpacking is no longer one thing 
but two different things, although with a very useful and still pretty solid 
parallel between them, and set display syntax now has two meanings, with a 
somewhat useful and weaker parallel. Even added together, that’s not as much of 
a learning burden as subscription looking like list displays. But it also isn’t 
as important a benefit.

The magic ** mode switch only pushes two complicated and 
already-not-quite-parallel forms a little farther apart, which is less of a 
cost. The keyword= is similar but even less so, especially since anywhere it 
could be confused is a syntax error. The dict display ::value doesn’t cause any 
new exceptions or break any existing parallels at all, so it’s even less of a 
cost. But there are plenty of other advantages and disadvantages of each of the 
four (and the minor variations on them in this thread); that’s just one factor 
of many to weigh.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GC5ZDVJNR5E36MFN4RERMZ4QQULVDIXX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Proposal: Keyword Unpacking Shortcut [was Re: Keyword arguments self-assignment]

2020-04-18 Thread Andrew Barnert via Python-ideas

On Apr 17, 2020, at 23:18, Steven D'Aprano  wrote:
> 
> 
> Keyword Unpacking Shortcut
> --
> 
> Inside function calls, the syntax
> 
>  **{identifier [, ...]}
> 
> expands to a set of `identifier=identifier` argument bindings.
> 
> This will be legal anywhere inside a function call that keyword 
> unpacking would be legal.

Which means that you can’t just learn ** unpacking as a single consistent thing 
that’s usable in multiple contexts with (almost) identical syntax and identical 
meaning, you have to learn that it has an additional syntax with a different 
meaning in just one specific context, calls, that’s not legal in the others. 
Each special case like that makes the language’s syntax a little harder to 
internalize, and it’s a good thing that Python has a lot fewer such special 
cases than, say, C.

Worse, this exact same syntax is a set display anywhere except in a ** in a 
call. Not only is that another special case to learn about the differences 
between set and dict displays, it also means that if you naively copy and paste 
a subexpression from a call into somewhere else (say, to print the value of 
that dict), you don’t get what you wanted, or a syntax error, or even a runtime 
error, you get a perfectly valid but very different value.

> On the other hand, plain keyword unpacking:
> 
>  **textinfo
> 
> is terse, but perhaps too terse. Neither the keys nor the values are 
> immediately visible. Instead, one must search the rest of the function 
> or module for the definition of `textinfo` to learn which parameters are 
> being filled in.

You can easily put the dict right before the call, and when you don’t, it’s 
usually because there was a good reason.

And there are good reasons. Ideally you shouldn’t have any function calls that 
are so hairy that you want to refractor them, but the the existence of 
libraries you can’t control that are too huge and unwieldy is the entire 
rationale here. Sometimes it’s worth pulling out a group of related parameters 
to a “launch_params” or “timeout_and_retry_params” dict, or even to a 
“build_launch_params” method, not just for readability but sometimes for 
flexibility (e.g., to use it as a cache or stats key, or to give you somewhere 
to hook easily in the debugger and swap out the launch_params dict.

> Backwards compatibility
> ---
> 
> The syntax is not currently legal so there are no backwards 
> compatibility concerns.

The syntax is perfectly legal today. The syntax for ** unpacking in a call 
expression takes any legal expression, and a set display is a legal expression. 
You can see this by calling compile (or, better, dis.dis) on the string 
'spam(**{a, b, c})'.

The semantics will be a guaranteed TypeError at runtime unless you’ve done 
something pathological, so almost surely nobody’s deployed any code that 
depends on the existing semantics. 

But that’s not the same as the syntax not being legal. And, outside of that 
trivial backward compatibility nit, this raises a bunch of more serious issues. 

Running Python 3.9 code in 3.8 would do the wrong thing, but maybe not wrong 
enough to break your program visibly, which could lead to some fun debugging 
sessions. That’s not a dealbreaker, but it’s definitely better for new syntax 
to raise a syntax error in old versions, if possible.

And of course existing linters, IDEs, etc. will misunderstand the new syntax 
(which is worse than failing to parse it) until they’re taught the new special 
case.

This also raises an implementation issue. The grammar rule to disambiguate this 
will probably either be pretty hairy, or require building a parallel fork of 
half the expression tree so you can have an “expression except for set 
displays” node. Or there won’t be one, and it’ll be done as a special case 
post-parse hack, which Python uses sparingly.

But all of that goes right along with the human confusion. If the same syntax 
can mean two different things in different contexts, it’s harder to internalize 
a usable approximate version of the grammar. For something important enough, 
that may be worth it, but I don’t think the benefits of this proposal reach 
that bar.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/I5VC4ONOG4F4KRP3TCQMAT4HCNUZT2O3/
Code of Conduct: http://python.org/psf/codeofconduct/

1 2 3 4 5 6 7 8 >

1 - 100 of 745 matches

Mail list logo